9

I was recently thinking about the memory cost of (a) training a CNN and (b) inference with a CNN. Please note, that I am not talking about the storage (which is simply the number of parameters).

How much memory does a given CNN (e.g. VGG-16 D) need for

  • (a) Training (with ADAM)
  • (b) Inference on a single image?

My thoughts

Basically, I want to make sure that I didn't forget anything with this question. If you have other sources which explain this kind of thought, please share them with me.

(a) Training

For training with ADAM, I will now assume that I have a Mini-batch size of $B \in \mathbb{N}$ and $w \in \mathbb{N}$ is the number of parameters of the CNN. Then the memory footprint (the maximum amount of memory I need at any point while training) for a single training pass is:

  • $2w$: Keep the weights and the weight updates in memory
  • $B \cdot $ Size of all generated feature maps (forward pass)
  • $w$: Gradients for each weight (backpropagation)
  • $w$: Learning rates for each weight (ADAM)

(b) Inference

In inference, it is not necessary to store a feature map of layer $i-1$ if the feature maps of layer $i$ are already calculated. So the memory footprint while inference is:

  • $w$: The model
  • The two most expensive successive layers (one which is already calculated, the net one which gets calculated)
Ethan
  • 1,657
  • 9
  • 25
  • 39
Martin Thoma
  • 19,540
  • 36
  • 98
  • 170

1 Answers1

2

Total RAM would be - Batch size X RAM to train one image (since backpropagation happens after the batch)

RAM for one training image -

A/ 4 Bytes X Number of parm

B/ Size of input for each layer considering downsampling and number of features map

(Suppose input are 200 × 300 pixels, the first layer’s feature maps might be 100 × 150, the second layer’s feature maps can be 50 × 75, and the third layer’s feature maps can be 25 × 38. The first convolutional layer has 100 feature maps, this first layer takes up 4 × 100 × 150 × 100 = 6 million bytes (6 MB). The second layer will take up 4 × 50 × 75 × 200 = 3 million bytes (3 MB).

C/ Size for the input image

10xAI
  • 5,929
  • 2
  • 9
  • 25