Topics in High-Dimensional Econometrics and ML Theory

Emory University, Spring 2024


Table of contents

  1. About
  2. Course Description
  3. Content
    1. Topic 1: Concentrations Inequalities
    2. Topic 2: Uniform law of large numbers
    3. Topic 3: Sparse linear models in high-dimensions
    4. Topic 4: Reproducing Kernel Hilbert Spaces
    5. Topic 5: Semiparametric Efficiency Theory
    6. Topic 6: Double/Debiased Machine Learning
  4. References
  5. Acknowledgments

About

Designed to be a Directed Study (a intensive reading in econometrics on a topic not covered in a regular course at Emory University), this course covers a variety of topics in high-dimensional econometrics and machine learning theory. The course is based on student presentations, and discussions among participants (students and faculty invited).

Course Description

This course aims to be a in-depth exploration of the theoretical foundations and practical applications of high dimensional statistics and machine learning theory at the graduate level. Since this a topics course, the content is based on the interests of the participants and student presenter. The primary objective of this directed reading is to provide a comprehensive and self-contained overview of high-dimensional statistics, covering topics such as concentration inequalities, empirical processes, uniform laws, reproducing kernel Hilbert spaces, semiparametric theory, double/debiased machine learning and their applications in causal inference.

Content

Topic 1: Concentrations Inequalities

  • Motivating examples
  • Classical bounds

Topic 2: Uniform law of large numbers

  • Uniform convergence
  • Rademacher complexity

Topic 3: Sparse linear models in high-dimensions

  • Different types of sparsity
  • Shrinkage estimators and regularizers
  • Regularization bias

Topic 4: Reproducing Kernel Hilbert Spaces

  • Hilbert spaces
  • Kernels and operations
  • Reproducing kernel Hilbert spaces
  • Kernel Ridge regression

Topic 5: Semiparametric Efficiency Theory

  • Semiparametric efficiency
  • Efficiency Influence Functions
  • Pathwise Defferentiability and Distributional Taylor Expansion

Topic 6: Double/Debiased Machine Learning

  • Neyman Orthogonality
  • Sample Splitting
  • Cross-fitting
  • Applications

References

The content of the course will be based on the following references:

  1. High-Dimensional Statistics: A Non-Asymptotic Viewpoint by Martin Wainwright.

  2. Lectures Notes for Machine Learning Theory (CS229M/STATS214) by Tengyu Ma.

  3. Introduction to RKHS, and some simple kernel algorithms by Arthur Gretton.

  4. Machine Learning for Econometrics by Christophe Gaillac and Jeremy L’Hour.

Acknowledgments

I thank Prof. David Jacho-Chavez for his guidance and serve as Faculty Sponsor to develop this course.