Which is the fastest image pretrained model?

Question

I had been working with pre-trained models and was just curious to know the fastest forward propagating model of all the computer vision pre-trained models. I have been trying to achieve faster processing in one-shot learning and have tried the forward propagation with few models over a single image and the results are as follows:

VGG16: 4.857 seconds
ResNet50: 0.227 seconds
Inception: 0.135 seconds

Can you tell the fastest pre-trained model available out there and the drastic time consumption difference amongst the above-mentioned models.

score 6 · Accepted Answer · answered Oct 04 '18 at 22:31

The answer will depend on some things such as your hardware and the image you process. Additional, we should distinguish if you are talking about a single run through the network in training mode or in inference mode. In the former, additional parameters are pre-computed and cached as well as several layers, such as dropout, being used, which are simply left out during inference. I will assume you want to simply produce a single prediction for a single image, so we are talking about inference time.

Factors

The basic correlation will be:

more parameters (i.e. learnable weights, bigger network) - slower than a model with less parameters
more recurrent units - slower than a convolutional network, which is slower than a full-connected network¹
complicated activation functions - slower than simple ones, such as ReLU
deeper networks - slower than shallow networks (with same number of parameters) as less run in parallel on a GPU

Having listed a few factors in the final inference time required (time taken to produce one forward run through the network), I would guess that MobileNetV2 is probably among the fastest pre-trained model (available in Keras). We can see from the following table that this network has a small memory footprint of only 14 megabytes with ~3.5 million parameters. Compare that to your VGG test, with its ~138 million... 40 times more! In addition, the main workhorse layer of MobileNetV2 is a conv layer - they are essentially clever and smaller versions of residual networks.

Extra considerations

The reason I included the whole table above was to highlight that with small memory footprints and fast inference times, comes a cost: low accuracies!

If you compute the ratios of top-5 accuracy versus number of parameters (and generally versus memory), you might find a nice balance between inference time and performance.

¹ Have a look at this comparison of CNNs with Recurrent modules

Which is the fastest image pretrained model?

1 Answers1

Factors

Extra considerations

Linked