9

I am trying to understand an article Backpropagation In Convolutional Neural Networks

But I can not wrap my head around that diagram: enter image description here

The first layer has 3 feature maps with dimensions 32x32. The second layer has 32 feature maps with dimensions 18x18. How is that even possible ? If a convolution with a kernel 5x5 applied for 32x32 input, the dimension of the output should be $(32-5+1)$ by $(32-5+1)$ = $28$ by $28$.

Also, if the first layer has only 3 feature maps, the second layer should have multiple of 3 feature maps, but 32 is not multiple of 3.

Also, why is the size of the third layer is 10x10 ? Should it be 9x9 instead ? The dimension of the previouse layer is 18x18, so 2x2 max pooling should reduce it to 9x9, not 10x10.

Ethan
  • 1,657
  • 9
  • 25
  • 39
koryakinp
  • 436
  • 2
  • 5
  • 14

3 Answers3

4

Actually I guess you are making mistake about the second part. The point is that in CNNs, convolution operation is done over volume. Suppose the input image is in three channels and the next layer has 5 kernels, consequently the next layer will have five feature maps but the convolution operation consists of convolution over volume which has this property: each kernel will have its width and height, moreover, a depth. its depth is equal to the number of feature maps, here channels of the image, of the previous layer. Take a look at here.

Green Falcon
  • 14,308
  • 10
  • 59
  • 98
1

It could be a case of padding in combination with convolution strides: if you would pad the first layer with 2 zeroes on either side and use a stride of 2, you would end up with an 18 * 18 * x. The 3 channels on the input are most probably RG&B, which are fairly commonly scaled up to 32 feature maps.

S van Balen
  • 1,364
  • 1
  • 9
  • 28
0

the reason you're confused is because some parameters was ommitted in the paper. Assume from first layer to second layer, with the stride=2 and pad=4 and 32 kernels, 18=(32+2*4-5)/2-1

dino
  • 1
  • 1