Skip to content

Instantly share code, notes, and snippets.

@avibryant
Last active June 3, 2016 02:34
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save avibryant/622a156e6427ffcfa535dab068df0725 to your computer and use it in GitHub Desktop.
Save avibryant/622a156e6427ffcfa535dab068df0725 to your computer and use it in GitHub Desktop.

An idiosyncratic guide to teaching yourself practical machine learning, without links:

  • Find a binary classification dataset; maybe you have one internally.
  • Implement a simple decision tree algorithm, like CART.
  • Write some code to validate your model; produce an ROC curve and understand the tradeoff it embodies.
  • Compare the ROC for your training set with the ROC for a holdout and understand what it means that they differ.
  • Experiment with some hyperparameters: how does the comparison above change as you adjust the depth of the tree or other stopping criteria?
  • Combine your decision tree algorithm with bagging to produce a random forest. How does its ROC compare?
  • Do the same hyperparameter tuning here. (How many trees?) Reflect on overfitting and on the bias/variance tradeoff.

Now go spend 3 years doing feature engineering. Reflect on how nice immutable logs are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment