Course: STAT 946 — Mathematics of Data Science (Adv. Topics)
Section: Lec 001— Winter 2021
Location/Medium: Live-stream
Live-stream Times: TTh 1-2.20pm EST (GMT-5)
Office Hours: By appointment
Course website: See Learn.
Syllabus:
The goal of this course is to quickly orient you toward mathematical problems at the heart of data science. This course will focus on the statistical and computational limits of high-dimensional problems. The first third of the course will serve as an introduction to the basic tools of the course, namely high-dimensional probability and random matrix theory. We will then cover a broad range of applications. Depending on interest, applications may include:
- Dimension reduction
- Sparsity
- Low-rank models: the spiked matrix model for PCA and the BBP transition
- Community detection
- Mean-field methods: approximate message passing, belief propagation, et al
- Neural networks: mean-field limits and NTK for shallow networks, related kernel methods, approximation theory
- Stochastic approximation algorithms, Markov processes, and their mixing properties
- Random optimization problems
- Statistical phase transitions: Statistical—to—computational gaps
Suggested Background: This will be an advanced topics course and will be mathematically rigorous. We will focus on theoretical results regarding both statistical and computational limits. It is strongly recommended that the student have successfully completed at least one advanced course in probability, stochastic processes, or mathematical statistics, as well as courses in analysis and linear algebra. Please contact the instructor to confirm if your background is sufficient.
Lectures
Part I: Fundamentals
Week 1: High-dimensional probability
- Lec 1: Surprises in high-dimensions,
- Lec 2: Concentration of Gaussians
Week 2: High-dimensional probability
- Lec 3: Concentration for (Gaussian) random matrices
- Lec 4: Isoperimetric, Log-Sobolev, and Poincare inequalities
Week 3: PCA and spiked matrix models
- Lec 5: Covariance estimation and spiked matrix models
- Lec 6: Stieljes transforms and Wigner's theorem
Week 4: Spiked matrix models
- Lec 7: The Marchenko—Pastur Law
- Lec 8: The Baik-Ben Arous-Peche transition: a short proof
Week 5: Gaussian processes and the M* bound
- Lec 9: Comparison Inequalities
- Lec 10: Recovering a vector from a few random measurements
Week 6: The escape theorem and exact recovery
- Lec 10: Recovering a vector from a few random measurements
Part II: Selected recent results
Week 6: Mean-field methods I
- Lec 11: Approximate Message Passing in high-dimensional statistics
Week 7: Mean-field methods II
- Lec 12: Approximate Message passing (ctd)
- Lec 13: Mean-field theories, MCMC, free energies
Week 8: Neural Networks
- Lec 14: Neural Nets and Approximation theory
- Lec 15: Double Descent for random features
Week 9: Neural Networks and Presentations
- Lec 16: Guest Lecture: M. Nica "Random Features and NTK
- Presentation 1: (M. Majid) Sum-of-Squares, UGC hardness, and Average case hardness
Week 10: Presentations
- Presentation 2: (K. Ramsay) Benign overfitting in Linear regression
- Lec 17 (joint with Probability seminar) Guest Lecture: M. Nica, Gradients for neural networks
Week 11: Presentations
- Presentation 3: (L. Zhang) Random matrix approach to neural nets
- Presentation 4: (A. Mouzakis) The Low Degree and Statistical Query frameworks
Week 12: Wrap-up
- Lec 18: So long, and thanks for all the fish