Computer vision: object detection with labels that are single coordinates

Question

Are there papers in the literature that address the following object detection task ?

The task can be described as follows:

Given a set of images, the labels are just coordinates (x,y) that represent the object locations that we wish to detect. A coordinate is not necessarily at the centre of the object and the object can be of any size.
The task is to detect an object that is either a person, a boat, or a car. However, the labels do not state the category of the objects, the labels are simply coordinates that are close to the objects of interest.
The images are taken every hour and they are snapshots of the same scene; therefore, background subtraction techniques might help.
There are around 2000 images of the same scene and each image usually has 2 objects of interest.

I am wondering if such task has been approached before ?

Hog features and SVM have shown great success in detecting humans in images. But the relevant literature use training data where the objects of interest are labelled using a bounding box instead of a single coordinate.

The three main challenges are:

Choosing the bounding box for the classifier is difficult since objects can be of any size.
snapshots of the scene is taken every hour (camera can move slightly as well); so using background subtraction isn't straightforward.
we don't have much labelled data.

It would be interesting to see how people dealt with these challenges.

Thanks!

score 4 · Accepted Answer · answered Aug 13 '16 at 18:43

The state of the art in such problems is done these days via deep neural networks. Among others, two popular and recent approaches for solving the problem of detection and localization of objects are the YOLO paper, and the faster-RCNN, which run a classifier over many variously sized regions in an image.

As humans, boats and cars are popular object classes, I'd first attempt to see what existing pre-trained networks can do for your problem, and then, if needed, try and re-train them using your data.

Computer vision: object detection with labels that are single coordinates

1 Answers1