8

A recent paper by He et al. (Deep Residual Learning for Image Recognition, Microsoft Research, 2015) claims that they use up to 4096 layers (not neurons!).

I am trying to understand the paper, but I stumble about the word "residual".

Could somebody please give me an explanation / definition what residual means in this case?

Examples

We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.

[...]

Instead of hoping each few stacked layers directly fit a desired underlying mapping, we explicitly let these layers fit a residual mapping. Formally, denoting the desired underlying mapping as $\mathcal{H}(x)$, we let the stacked nonlinear layers fit another mapping of $\mathcal{F}(x) := \mathcal{H}(x)−x$. The original mapping is recast into $\mathcal{F}(x)+x$. We hypothesize that it is easier to optimize the residual mapping than to optimize the original, unreferenced mapping

Martin Thoma
  • 19,540
  • 36
  • 98
  • 170

1 Answers1

3

It's $F(x)$; the difference between the mapping $H(x)$ and its input $x$. It's a common term in mathematics (DE).

Emre
  • 10,541
  • 1
  • 31
  • 39