Course Description

Statistical Learning with Python is an introductory-level course in supervised learning, focusing on regression and classification methods. The syllabus includes:

  • Linear and polynomial regression
  • Logistic regression and linear discriminant analysis
  • Cross-validation and the bootstrap
  • Model selection and regularization methods (ridge and lasso)
  • Nonlinear models, splines, and generalized additive models
  • Tree-based methods, random forests, and boosting
  • Support-vector machines
  • Neural networks and deep learning
  • Survival models
  • Multiple testing

Some unsupervised learning methods are also discussed, including principal components and clustering (k-means and hierarchical).

This is not a math-heavy class; we describe methods without heavy reliance on formulas and complex mathematics, focusing on key elements of modern data science. Computing is done in Python, with lectures devoted to Python tutorials from the ground up, progressing to detailed sessions implementing techniques in each chapter.

We also offer a separate version of this course called Statistical Learning with R, where the chapter lectures are the same, but the lab lectures and computing are done using R.