-
Notifications
You must be signed in to change notification settings - Fork 58
Open
Description
Regardless of the width of the hidden units, it seems if I have more than 3 hidden layers, the dropout training does not work. I wonder if some bug is causing it.
4-layer with backprop only works
$ python mlp.py backprop
... building the model: hidden layers [600, 200, 100, 100], dropout: False [0.0, 0.0, 0.0, 0.0, 0.0]
... training
epoch 1, test error 0.375 (300), learning_rate=1.0 (patience: 7448 iter 930) **
epoch 2, test error 0.375 (300), learning_rate=0.998 (patience: 7448 iter 1861)
epoch 3, test error 0.375 (300), learning_rate=0.996004 (patience: 7448 iter 2792)
epoch 4, test error 0.375 (300), learning_rate=0.994011992 (patience: 7448 iter 3723)
epoch 5, test error 0.375 (300), learning_rate=0.992023968016 (patience: 7448 iter 4654)
epoch 6, test error 0.3625 (290), learning_rate=0.99003992008 (patience: 7448 iter 5585) **
epoch 7, test error 0.33875 (271), learning_rate=0.98805984024 (patience: 22340.0 iter 6516) **
epoch 8, test error 0.3175 (254), learning_rate=0.986083720559 (patience: 26064.0 iter 7447) **
epoch 9, test error 0.32375 (259), learning_rate=0.984111553118 (patience: 29788.0 iter 8378)
epoch 10, test error 0.325 (260), learning_rate=0.982143330012 (patience: 29788.0 iter 9309)
4-layer with dropout does not work
$ python mlp.py dropout
... building the model: hidden layers [600, 200, 100, 100], dropout: [0.0, 0.5, 0.5, 0.5, 0.5]
... training
epoch 1, test error 0.375 (300), learning_rate=1.0 (patience: 27930 / 930) **
epoch 2, test error 0.375 (300), learning_rate=0.998 (patience: 27930 / 1861)
epoch 3, test error 0.375 (300), learning_rate=0.996004 (patience: 27930 / 2792)
epoch 4, test error 0.375 (300), learning_rate=0.994011992 (patience: 27930 / 3723)
epoch 5, test error 0.375 (300), learning_rate=0.992023968016 (patience: 27930 / 4654)
...
epoch 29, test error 0.375 (300), learning_rate=0.945486116479 (patience: 27930 / 26998)
epoch 30, test error 0.375 (300), learning_rate=0.943595144246 (patience: 27930 / 27929)
epoch 31, test error 0.375 (300), learning_rate=0.941707953958 (patience: 27930 / 28860)
3-layer with dropout works
$ python mlp.py dropout
... building the model: hidden layers [600, 200, 100], dropout: [0.0, 0.5, 0.5, 0.5]
... training
epoch 1, test error 0.375 (300), learning_rate=1.0 (patience: 27930 / 930) **
epoch 2, test error 0.375 (300), learning_rate=0.998 (patience: 27930 / 1861)
epoch 3, test error 0.375 (300), learning_rate=0.996004 (patience: 27930 / 2792)
epoch 4, test error 0.375 (300), learning_rate=0.994011992 (patience: 27930 / 3723)
epoch 5, test error 0.375 (300), learning_rate=0.992023968016 (patience: 27930 / 4654)
epoch 6, test error 0.365 (292), learning_rate=0.99003992008 (patience: 27930 / 5585) **
epoch 7, test error 0.3625 (290), learning_rate=0.98805984024 (patience: 27930 / 6516) **
epoch 8, test error 0.3375 (270), learning_rate=0.986083720559 (patience: 27930 / 7447) **
epoch 9, test error 0.3275 (262), learning_rate=0.984111553118 (patience: 29788 / 8378) **
epoch 10, test error 0.33375 (267), learning_rate=0.982143330012 (patience: 33512 / 9309)
epoch 11, test error 0.32875 (263), learning_rate=0.980179043352 (patience: 33512 / 10240)
epoch 12, test error 0.315 (252), learning_rate=0.978218685265 (patience: 33512 / 11171) **
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels