When using Residual/Inception networks start with the shallowest example possible.
Solving High train errors
Do the following actions on this order:
Inspect for defects on the data. (Need human intervention)
Check for software bugs on your library. (Use gradient check, it's probably a backprop error)
Tune learning rate (Make it smaller)
Make network deeper. (You should start with a shallow network on the beginning)
Solving High test errors
Do the following actions on this order:
Do more data augmentation (Also try generative models to create more data)
Add dropout and batch-norm
Get more data (More data has more influence on Accuracy then anything else)
Some trends
Following the late 2016 Andrew Ng lecture these are the topics that we need to pay attention.
1. Scalability
Have a computing system that scale well for more data and more model complexity.
2. Team
Have on your team divided with with AI people and HPC (Cuda, OpenCl, etc...).
3. Data first
Data is more important than your model, always try to get more quality data before trying to change your model.
4. Data Augmentation
Use normal data augmentation techniques plus Generative models(Unsupervised).
5. Make sure that Validation Set and Test set come from same distribution
This will avoid having a test or validation set that does not tell the reality. Also helps to check if your training is valid.
6. Have Human level performance metric
Have a team of experts to compare with your current system performance. Also it drives decisions between getting more data, or making model more complex.
7. Data server
Have a unified data-warehouse. All team must have access to data, with SSD quality access speed.
8. Using Games
Using games are cool to help augment datasets, but attention because games does not have the same variants of the same class as real life. For example GTA does not have enough car models compared to real life.
9. Ensembles always help
Training separately different networks and averaging their end results always gives some extra 2% accuracy. (Imagenet 2016 best results were simple ensambles)
10. What to do if you have more than 1000 classes
Use hierarchical Softmax to increase performance.
11. How many samples per class do we need to have good result
If training from scratch, use the same number of parameters. For example Model has 1000 parameters, so use 1000 samples per class.
If doing transfer learning is much less (Much is not defined yet, more is better).