Welcome to Economics 524 (424): Prediction and machine-learning in econometrics, taught by Ed Rubin and Jose Rojas Fallas.
Lecture Tuesdays and Thursdays, 10:00a-11:20a (Pacific), 105 Esslinger
Lab Friday, 10:00a–10:50a (Pacific), 072 PLC
Office hours
- R for Data Science
- Introduction to Data Science (not available without purchase)
- The Elements of Statistical Learning
- Data Science for Public Policy (ebook available through UO library)
Note: Links to topics that we have not yet covered lead to older slides. I will update links to the new slides as we work our way through the term/slides.
- Why do we have a class on prediction?
- How is prediction (and how are its tools) different from causal inference?
- Motivating examples
Readings Introduction in ISL
001 - Statistical learning foundations
- Why do we have a class on prediction?
- How is prediction (and how are its tools) different from causal inference?
- Motivating examples
Readings
- Prediction Policy Problems by Kleinberg et al. (2015)
- ISL Ch1
- ISL Start Ch2
Supplements Unsupervised character recognization
- Model accuracy
- Loss for regression and classification
- The variance-bias tradeoff
- The Bayes classifier
- KNN
Readings
- ISL Ch2–Ch3
- Optional: 100ML Preface and Ch1–Ch4
- Review
- The validation-set approach
- Leave-out-out cross validation
- k-fold cross validation
- The bootstrap
Readings
- ISL Ch5
- Optional: 100ML Ch5
004 - Linear regression strikes back
- Returning to linear regression
- Model performance and overfit
- Model selection—best subset and stepwise
- Selection criteria
Readings
- ISL Ch3
- ISL Ch6.1
In between: tidymodels-ing
- An introduction to preprocessing with
tidymodels. (Kaggle notebook) - An introduction to modeling with
tidymodels. (Kaggle notebook) - An introduction to resampling, model tuning, and workflows with
tidymodels(Kaggle notebook) - Introduction to
tidymodels: Follow up for Kaggle
(AKA: Penalized or regularized regression)
- Ridge regression
- Lasso
- Elasticnet
Readings
- ISL Ch4
- ISL Ch6
- Introduction to classification
- Why not regression?
- But also: Logistic regression
- And maximum likelihood estimation
- Assessment: Confusion matrix, assessment criteria, ROC, and AUC
Readings
- ISL Ch4
- ISL Ch6
Bonus Two nice interactive visualizations of gradient descent (and related algorithms)—and a mildly related game.
- Introduction to trees
- Regression trees
- Classification trees—including the Gini index, entropy, and error rate
Readings
- ISL Ch8.1–Ch8.2
- Introduction
- Bagging
- Random forests
- Boosting
Readings
- ISL Ch8.2
- Hyperplanes and classification
- The maximal margin hyperplane/classifier
- The support vector classifier
- Support vector machines
Readings
- ISL Ch9
010 - Unsupervised learning, dimensionality reduction, and image classification
- MNIST dataset (machines with vision)
- K-means clustering
- Principal component analysis (PCA)
- UMAP
Readings
- ISL Ch12
Also: An older notebook... .html | .qmd
011 - A quick introduction to neural networks
- anatomy of a single neuron
- logistic regression as a one-neuron network
- activation functions and hidden layers
- a brief overview of training
- the return of MNIST
Readings
- NNs and Deep Learning Ch1 (some 2)
- ISL 10 (Deep Learning)
Starting in week 2, you will submit brief write-ups for machine-learning/AI applications that you find interesting. These nine write-ups together will count for a single "project" grade.
Submit by Monday at 11:59p (Pacific) each week (on Canvas):
- What is the application?
- What kind of prediction problem is it?
- What ML/AI methods are they using?
- Why is this interesting/useful?
- URL/file for the article or post.
The "coolest" applications will be highlighted in class and will receive extra credit.
Week 2
- Youtube's recommendation algorithm
- AI detection of tumors in CT scans
- Google's species identification AI
- AI plays video games (and many others, e.g., here)
Week 3
- New ML/AI approaches boost weather and flood prediction/forecast accuracy.
- Predicting political leaning with Facebook activity (maybe broke the world).
- Claude Code seems good.
- A finace paper about generating finance papers with AI (insert finance joke here).
Week 4
- Wildfire fuel mapping
- Wearable fitness trackers to predict cardiovascular events
Week 5 Insurance week!
- Logistic regression vs. boosting for insurance claims
- A bunch of algorithms for life (insurance) risk
Week 6 ML-driven matching.
- ML to reunite separated families
- Uber's approach to matching riders and drives
Week 7 Audits.
- Tax audits and poverty prediction with ML.
- NBER working paper: Predicting policy misconduct.
Week 8 More (nuanced?) audits and outlier detection.
- An accounting firm on the promise and pitfalls of AI and audits.
- Using outliers in high dimensions to measure partisanship.
- NBER working paper on the difference between out-of-sample performance and in-the-field performance (applied to bogus firms).
Past, present, and future projects.
Example Using tidymodels (and tidyverse) with the Chicago housing data.
000 An introduction to prediction and resampling
- Instructions
- Data
- Due: 20 January 2026
001 Cross validation and penalized regression
- Instructions (also in plain Github markdown)
- Data
- Due: 28 January 2026
002 Classification
- Instructions
- Data
- Due: 17 February 2026
003 Trees, ensembles, and boosting
- Instructions
- Data
- Due: 03 March 2026
004 Prediction finale (cancelled)
Class project 01: Application
Selected topic due by 30 January 2026
Project due 04 March 2026
Class project 02: Extension
Selected topic due by 13 February 2026
Project due 11 March 2026
In-class exam: Tuesday (17 March 2026) at 8:00a–10:00a
Note: Some previous years had a take-home portion of the final exam. This year, we will only have an in-class exam.
Prep materials
Previous in-class exams: 2023 | 2024 | 2025
Previous take-home exam: 2023 | 2024
Note: We will not provide keys.
Approximate/planned topics... or at least for reference...
- General "best practices" for coding
- Working with RStudio
- The pipe (
%>%) - Cleaning and Kaggle follow up
- Install Quarto. Follow this link, download the installer for your operating system, and follow the instructions to install Quarto
- Download (and unzip) the Lab Files
- Create a project in Rstudio in a separate folder
- Copy/move the Lab Files to a folder dedicated to this lab
- Open the Quarto document in Rstudio and follow the instructions
02 - Introduction to tidymodels
- Download the Lab File
- We will learn about cleaning data quickly and efficiently with
tidymodels
Formats .html
I wrote a very short guide to finding a job.
For programming-related jobs, get some practice on
- UO library resources/workshops
- RStudio's recommendations for learning R, plus cheatsheets, books, and tutorials
- YaRrr! The Pirate’s Guide to R (free online)
- Advanced R (free online)
- R for Data Science (free online)
- R Graphics Cookbook (free online)
- Data Visualization (free online)
- Happy Git and GitHub for the useR by Jenny Bryan, the "STAT 545 TAs", and Jim Hester
- Python Data Science Handbook by Jake VanderPlas
- Elements of AI
- Caltech professor Yaser Abu-Mostafa: Lectures about machine learning on YouTube
- From Google:
- 3Blue1Brown's Youtube channel (has great videos on neural networks and other math topics)
- Fast.ai (free online courses on deep learning and machine learning)
- Geocomputation with R (free online)
- Spatial Data Science (free online)
- Applied Spatial Data Analysis with R