M518: Big Data in Economics

This course is an introduction to popular tools from the field of statistical and machine learning. After reviewing basic concepts, such as the bias-variance trade-off, linear regression, and cross-validation, we will cover a broad-range of machine learning methods, for example, shrinkage estimators (ridge regression and LASSO), splines, and random forests. if time permits, we will also cover state-of-the-art methods that apply machine learning techniques to causal inference, for example, double/debiased machine learning, causal forests, and matrix completion methods. Throughout the course, we will use the software package R and economic data to illustrate the discussed concepts and methods. Empirical projects with real-world data and student presentations will be an integral part of the class. We will use the projects to also discuss the full workflow of data science from getting data, importing and cleaning data, visualizing data, to communicating the results of empirical analyses.

This class is also offered as an advanced undergraduate class for Economics majors cross-listed as E401: Machine Learning for Economic Data.

All lecture material is available on Canvas.