Questions tagged [object-recognition]

62 questions
41
votes
2 answers

How to calculate mAP for detection task for the PASCAL VOC Challenge?

How to calculate the mAP (mean Average Precision) for the detection task for the Pascal VOC leaderboards? There said - at page 11: Average Precision (AP). For the VOC2007 challenge, the interpolated average precision (Salton and Mcgill 1986) was…
14
votes
5 answers

How can you include information not present in an image for neural networks?

I am training a CNN to identify objects in images (one label per image). However, I have additional information about these images that cannot be retrieved by looking at the image itself. In more detail, I'm talking about the physical location of…
seb
  • 143
  • 1
  • 6
11
votes
0 answers

Python : Feature Matching + Homography to find Multiple Objects

I'm trying to use OpenCV via Python to find multiple objects in a train image and match it with the key points detected from a query image. For my case, I'm trying to detect the tennis courts in the image provided below. I looked at the online…
Reward
  • 111
  • 1
  • 3
10
votes
2 answers

How does the bounding box regressor work in Fast R-CNN?

In the fast R-CNN paper (https://arxiv.org/abs/1504.08083) by Ross Girshick, the bounding box parameters are continuous variables. These values are predicted using regression method. Unlike other neural network outputs, these values do not represent…
10
votes
2 answers

Train object detection without annotated data/bounding boxes

From what I can see most object detection NNs (Fast(er) R-CNN, YOLO etc) are trained on data including bounding boxes indicating where in the picture the objects are localised. Are there algos that simply take the full picture + label annotations,…
9
votes
1 answer

How does YOLO algorithm detect objects if the grid size is way smaller than the object in the test image?

In YOLO algorithm how do these grids output a prediction if some grids only see a small black portion of the car if the model was trained on datasets with full images?
9
votes
5 answers

How does deep learning helps in detecting multiple objects in single image?

Let's say there are two cars in an image. How can it detect these cars, given that it can detect single car in an image?
8
votes
1 answer

Recognition human in images through HOG descriptor and SVM classifier performs poorly

I'm using a HOG descriptor, coupled with a SVM classifier, to recognise humans in pictures. I'm using the Python wrappers for OpenCV. I've used the excellent tutorial at pymagesearch, which explains what the algorithm does and furnishes hints on how…
martina.physics
  • 255
  • 2
  • 8
7
votes
1 answer

algorithmic difference between image analysis and video analysis

Is there algorithmic difference between analyzing video and an image, say for example,if I want object recognition? Or do I just have to analyze every frame of the the video just as an image? Example, detecting an object in a single image is easy…
Amanuel Negash
  • 471
  • 4
  • 8
6
votes
2 answers

Does resizing images during training affect the bounding box annotations?

I am using the TensorFlow object detection API to train my own custom dataset and am preparing annotations for the same. I see from the config file of my pre-trained SSD inception net, the size of the image is reduced to 300 x 300 during training.…
6
votes
1 answer

What techniques to use for image matching

I have a database with around 30,000 pictures. All of them are a different object. They are all from a certain perspective, the pictures itself are the same size but the objects vary in size. I want to build a system that you can query with a new…
Jan van der Vegt
  • 9,448
  • 37
  • 52
5
votes
2 answers

Bounding Boxes in YOLO Model

The YOLO model splits the image into smaller boxes and each box is responsible for predicting 5 bounding boxes. My question is how does the model make these bounding boxes for every grid cell ? Does each box have a predefined offset with respect to…
4
votes
1 answer

Why is a general/original softmax loss not preferred in FR (face recognition)?

In some papers I've read that softmax loss is not preferred in FR since it does not give a good inter-class and intra-class margins, but could not understand 'why?'. So can someone explain, why softmax loss is not preferred in FR, in both…
3
votes
0 answers

Is it common to preprocess image data before sending it through a deep net?

I'm curious as how convolutional neural network are used in practice for object recognition. Is it common to perform data preprocessing before providing the data to the input layer ? If so, what types of preprocessing are common ?
3
votes
1 answer

What are the possible ways to handle class unbalance in a large scale image recognition problem with Deep Neural Nets?

I have 22 classes of objects but they have very skewed distributions where max class has 100.000 images and the min class has 1600 images. In that setting I would like to hear some possible solutions to this balance problem. I have tried followings…
1
2 3 4 5