How does YOLO algorithm detect objects if the grid size is way smaller than the object in the test image?

Question

In YOLO algorithm how do these grids output a prediction if some grids only see a small black portion of the car if the model was trained on datasets with full images?

Neil Slater · Answer 1 · 2018-01-08T17:41:52.727

Each grid predictor in YOLO should only have a high score that an object is within it, if it detects the centre of the bounding rectangle is inside itself. So a grid point that contains only the wing mirror of a car should decide it has a low probability of containing the centre of the car.

The predicted bounding rectangles are not constrained in the same way - YOLO can (and often does) predict bounding box dimensions from the centre that are larger than the grid cell dimensions.

Each grid point is independently able to predict whether or not it contains an object's centre, what the bounding box dimensions are for the object, and what the object class is.

If you trace the layer connectivity, you will see that the grid cells are effectively interconnected in lower output layers, so the network as a whole "sees" more of each object and can influence individual object predictions, suppressing some and encouraging others, when objects span multiple grid locations. The grid cells are not isolated into sections, or restricted to only using data from the area that they cover for the prediction. The concept of what part of each image a feature "pixel" can access from the base image in a CNN is called the "receptive field" of the network, and can be calculated based on the architecture as explained in this blog on Medium.

How does YOLO algorithm detect objects if the grid size is way smaller than the object in the test image?

1 Answers1

Linked