3

In the paper Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network by Christian Ledig et al., the distance between images (used in the loss function) is calculated from feature maps $\phi_{i,j}$ extracted from the VGG19 network,
where $\phi_{i,j}$ is defined as "feature map obtained by the j-th convolution (after activation) before the i-th maxpooling layer".

Can you elaborate on how to calculate this feature map, may be for VGG54 mentioned in the paper?

$\phi_{5,4}$ means 4th convolutional layer before 5th max-pooling layer right? But 4th layer has so 512 filters. So we would have 512 feature spaces. Which one to choose from this? Also what does "after activation" mean?

I found this answer related to the same issue, but the answer didn't explain much.

Nagabhushan S N
  • 724
  • 3
  • 9
  • 24

1 Answers1

2

In section 2.2.1 of the paper, they state that they use euclidean distance. I'm going to take your word that there are 512 filter activations in that layer; if I'm reading this right, there aren't 512 feature spaces, there is a 512-dimensional feature space that they are calculating euclidean distance in. So your distance function between two images $p$ and $q$ is just the standard Euclidean distance formula:

$$ d(\mathbf{p},\mathbf{q}) = \sqrt{\sum_{i=1}^{512}(p_i - q_i)^2}$$

where $\mathbf{p}$ and $\mathbf{q}$ are vectors holding the corresponding filter activations of $p$ and $q$.


Edit: Above the horizontal rule is my original answer which is wrong (or incomplete). What I think is happening is that the authors are taking the euclidean distance as above for each position in the feature maps at the $i,j$ layer, and averaging those distances to generate a scalar loss value. So for a 7x7 feature map, they'd be taking 49 512-dimensional euclidean distances and averaging them to get the VGG19 5,4 loss. This is how I read equation (5) in section 2.2.1 in their paper. I think the missing piece is that the authors don't bother with the square root in the euclidean distance formula. As discussed below, I think the notation is unclear.

Matthew
  • 1,294
  • 8
  • 12