3

I'm still new to machine learning and just came across powerful deep learning library, Keras.

I've read Keras document and tried few Keras examples on Github here. I've also studied some basic knowledge and concepts of deep learning from several sources but still haven't really had solid understanding in CNN and RNN which look to be very powerful networks.

So, to prove my assumption, I downloaded reuters_mlp.py example from Keras Github which originally uses simple MLP networks as a model. I combined the idea of CNN which I got from imdb_cnn.py example to reuters_mlp.py example and then observed the result.

Surprisingly, the result didn't come out like I expected. CNN performed worst than simple MLP networks. Can someone please explain why the accuracy of CNN is lower than the simple MLP networks?

Here are the outputs (Tensorflow as backend)

8982 train sequences, 2246 test sequences, 46 classes, num_words=1000

MLP (sequences_to_matrix, mode=bianry):

Epoch 1/5 8982/8982 [==============================] - 3s - loss: 1.3236 - acc: 0.6984
Epoch 2/5 8982/8982 [==============================] - 2s - loss: 0.7182 - acc: 0.8250
Epoch 3/5 8982/8982 [==============================] - 2s - loss: 0.4544 - acc: 0.8864
Epoch 4/5 8982/8982 [==============================] - 2s - loss: 0.3197 - acc: 0.9192
Epoch 5/5 8982/8982 [==============================] - 2s - loss: 0.2511 - acc: 0.9356
1920/2246 [========================>.....] - ETA: 0s

Test loss: 1.05213204963 Test accuracy: 0.785396260071

CNN (pad_sequences):

Epoch 1/5 8982/8982 [==============================] - 81s - loss: 1.9794 - acc: 0.5181
Epoch 2/5 8982/8982 [==============================] - 78s - loss: 1.4289 - acc: 0.6591
Epoch 3/5 8982/8982 [==============================] - 79s - loss: 1.1546 - acc: 0.7175
Epoch 4/5 8982/8982 [==============================] - 78s - loss: 0.9639 - acc: 0.7663
Epoch 5/5 8982/8982 [==============================] - 77s - loss: 0.8378 - acc: 0.7935
2240/2246 [============================>.] - ETA: 0s

Test loss: 0.960687935512, Test accuracy: 0.764470169243

Satjapong Meeklai
  • 163
  • 1
  • 2
  • 7

1 Answers1

4

CNN (and RNN) models are not general improvements to the MLP design. They are specific choices that match certain types of problem. The CNN design works best when there is some local pattern in the data (which may repeat in other locations), and this is often the case when the inputs are images, audio or other similar signals.

The reuters example looks like a "bag of words" input. There is no local pattern or repeating relationships in that data that a CNN can take advantage of.

Your results with a CNN on this data set look reasonable to me. You have not made a mistake, but learned how a CNN really works on this data.

Neil Slater
  • 29,388
  • 5
  • 82
  • 101