-
Machine Learning diagnostic: a test that you can run to gain insight what is/isn’t working with a learning algorithm and gain guidance on how best to improve the performance
-
Evaluating the hypothesis
- Cross validation. Divide the data set into two parts: Training Set(70%) and Test Set(30%). Learn the parameters from the training set and implement them on test set, find out the errors. Choose the model with least test error
-
As the learning algorithm will work well on dataset we got, the error of the model we choose on test set, is likely to be more optimistic than the generalized error. So one step further than 2, we divide dataset into 3 parts : Training Set(60%), Cross validation Set(20%), test set(20%)
After train the model with training set, estimate the errors with the cross validation set, choose the model with best cross validation set, then estimate the generalized error on test set