In this highly cited paper, authors give the following discussion on the number of weight parameters. I am not very clear why it has $49C^2$ parameters. I think it should be $49C$ since each of $C$ input channels shares the same filter, which has $49$ parameters.
Asked
Active
Viewed 1.6k times
1 Answers
18
Actually it's $49C*C$, the first $C$ is the number of input channels, and the second $C$ is the number of filters.
Quote from CS231n:
To summarize, the Conv Layer:
- Accepts a volume of size $W_1 \times H_1 \times D_1$
- Requires four hyperparameters:
- Number of filters $K$,
- their spatial extent $F$,
- the stride $S$,
- the amount of zero padding $P$.
- Produces a volume of size $W_2 \times H_2 \times D_2$ where:
- $W_2 = (W_1 - F + 2P)/S + 1$
- $H_2 = (H_1 - F + 2P)/S + 1$ (i.e. width and height are computed equally by symmetry)
- $D_2 = K$
- With parameter sharing, it introduces $F \cdot F \cdot D_1$ weights per filter, for a total of $(F \cdot F \cdot D_1) \cdot K$ weights and $K$ biases.
- In the output volume, the $d$-th depth slice (of size $W_2 \times H_2$) is the result of performing a valid convolution of the $d$-th filter over the input volume with a stride of $S$, and then offset by $d$-th bias.
A common setting of the hyperparameters is $F = 3, S = 1, P = 1$. However, there are common conventions and rules of thumb that motivate these hyperparameters. See the ConvNet architectures section below.
Icyblade
- 4,376
- 1
- 25
- 34
