Are there papers in the literature that address the following object detection task ?
The task can be described as follows:
Given a set of images, the labels are just coordinates (x,y) that represent the object locations that we wish to detect. A coordinate is not necessarily at the centre of the object and the object can be of any size.
The task is to detect an object that is either a person, a boat, or a car. However, the labels do not state the category of the objects, the labels are simply coordinates that are close to the objects of interest.
The images are taken every hour and they are snapshots of the same scene; therefore, background subtraction techniques might help.
There are around 2000 images of the same scene and each image usually has 2 objects of interest.
I am wondering if such task has been approached before ?
Hog features and SVM have shown great success in detecting humans in images. But the relevant literature use training data where the objects of interest are labelled using a bounding box instead of a single coordinate.
The three main challenges are:
- Choosing the bounding box for the classifier is difficult since objects can be of any size.
- snapshots of the scene is taken every hour (camera can move slightly as well); so using background subtraction isn't straightforward.
- we don't have much labelled data.
It would be interesting to see how people dealt with these challenges.
Thanks!