5

Is applying dropout equivalent to zeroing output of random neurons in each mini-batch iteration and leaving rest of forward and backward steps in back-propagation unchanged? I'm implementing network from scratch in numpy.

hH1sG0n3
  • 2,098
  • 8
  • 28
Qbik
  • 195
  • 4

1 Answers1

6

Indeed. To be precise, the dropout operation will randomly zero some of the input tensor elements with probability $p$, and furthermore the rest of the non-dropped out outputs are scaled by a factor of $\frac{1}{1-p}$ during training.

For example, see how elements of each tensor in the input (top tensor in output) are zeroed in the output tensor (bottom tensor in output) using pytorch.

m = nn.Dropout(p=0.5)
input = torch.randn(3, 4)
output = m(input)

print(input, '\n', output)

>>> tensor([[-0.9698, -0.9397, 1.0711, -1.4557], >>> [-0.0249, -0.9614, -0.7848, -0.8345], >>> [ 0.9420, 0.6565, 0.4437, -0.2312]]) >>> tensor([[-0.0000, -0.0000, 2.1423, -0.0000], >>> [-0.0000, -0.0000, -1.5695, -1.6690], >>> [ 0.0000, 0.0000, 0.0000, -0.0000]])

EDIT: please note the post has been updated to reflect Todd Sewell's addition in the comments.

hH1sG0n3
  • 2,098
  • 8
  • 28