AI & MACHINE LEARING FOR ECONOMIC ANALYSIS COURSE
AI and Machine Learning in Economic Analysis
Target Audience: Economics, Policy Analysts in the Public Sector
Course description
Introduces machine learning (ML) and artificial intelligence (AI) methods applied to economic problems. Emphasis on prediction, causal inference, policy evaluation, forecasting, and interpretation. Combines theory, empirical labs, and a substantial project using real economic data.
Learning objectives
- Understand core ML methods and how they differ from classical econometrics.
- Apply ML tools for prediction, forecasting, and causal inference in economic contexts.
- Interpret ML models and assess policy-relevant quantities (treatment effects, counterfactuals).
- Manage economic data, perform feature engineering, and evaluate models rigorously.
- Communicate results, including limitations, robustness, and ethical considerations.
Assessment
- Weekly labs/homeworks: 30%
- Midterm exam or take-home: 20%
- Final project (report + presentation + code): 40%
- Participation/reading responses: 10%
Software/tools
- Python: pandas, scikit-learn, statsmodels, xgboost/lightgbm, tensorflow/pytorch (optional), econml/causalml.
- R alternatives: tidyverse, caret, randomForest, GRF (generalized random forests), causalTree, glmnet.
- Data sources: World Bank, FRED, CPS, IPUMS, Penn World Table, Compustat, Kaggle.
Course Outline
4 weeks intensive physical Boot camp OR 2 weeks physical training and 3 weeks online training
Introduction & Motivation
- Topics: What is AI/ML vs econometrics; role of ML in economic analysis; prediction vs causal inference; reproducible research.
- Lab: Environment setup, data cleaning demo, simple prediction with linear regression and cross-validation.
- Readings: Varian (2014) “Big Data”, Mullainathan & Spiess (2017) overview.
Data, Features, Regularization & Model Selection
- Topics: Feature engineering, missing data, regularization (Ridge, Lasso), cross-validation, model selection criteria.
- Lab: Lasso and Ridge in Python/R; selecting lambda with CV; interpreting coefficients.
- Readings: Hastie, Tibshirani & Friedman (selected sections); Mullainathan & Spiess (sections on prediction).
Supervised Learning: Trees and Ensembles
- Topics: Decision trees, random forests, gradient boosting (XGBoost/LightGBM), bias-variance tradeoff, variable importance.
- Lab: Train/evaluate tree-based models on microdata (e.g., wage or credit datasets).
- Readings: Breiman (RF), Friedman (boosting) intro materials.
Model Interpretation & Explainability
- Topics: Partial dependence, SHAP/LIME, heterogeneity exploration, surrogate models, interpretable ML for policy.
- Lab: Use SHAP/ICE plots to interpret a boosted model applied to treatment targeting example.
- Readings: Molnar (interpretable ML excerpts), Athey & Imbens (on interpretability themes).
Time Series Forecasting & ML for Macroeconomic Data
- Topics: Classical forecasting (ARIMA), ML for forecasting (feature-based, tree/NN models), evaluation metrics, mixing frequencies.
- Lab: Forecast GDP/unemployment using classical and ML models; holdout strategies for time series.
- Readings: Hyndman & Athanasopoulos (selected chapters).
Causal Inference I: Foundations & Machine Learning for Causality
- Topics: Potential outcomes framework, ignorability, selection on observables, propensity scores, double robust estimation.
- Lab: Propensity score matching, IPW, doubly robust estimators using ML for nuisance functions.
- Readings: Athey & Imbens (2019 overview), Angrist & Pischke (relevant chapters).
Causal ML II: Targeted/Double ML & Heterogeneous Treatment Effects
- Topics: Double/debiased machine learning (DML), causal forests, uplift modeling, estimating heterogeneous effects.
- Lab: Implement DML and causal forest (grf/econml) for treatment effect heterogeneity; interpret results for targeting policy.
- Readings: Chernozhukov et al. (2018) on DML, Wager & Athey (2018) on causal forests.
Panel Data, High-dimensional Controls, Regularized IV
- Topics: High-dimensional fixed effects, Lasso for selection, post-selection inference, machine learning in IV and structural estimation.
- Lab: Apply high-dimensional methods to panel microdata; implement Lasso-IV / Post-Lasso.
- Readings: Belloni, Chernozhukov & Hansen (Lasso for IV).
Structural Problems & Combining ML with Economic Models
- Topics: Using ML to approximate components of structural models (two-step), demand estimation with ML (e.g., random coefficients), policy simulations.
- Lab: Example: integrate ML-predicted choice probabilities into a simple structural demand estimation or discretized dynamic program.
- Readings: Athey (2018) on ML & structural models; classic discrete choice references.
Reinforcement Learning & Dynamic Policy Evaluation (optional / advanced
- Topics: Markov decision processes, RL basics, off-policy evaluation, contextual bandits for policy targeting.
- Lab: Contextual bandit for ad allocation / subsidy targeting; offline policy evaluation using importance sampling/Doubly Robust.
- Readings: Sutton & Barto (selected), work on contextual bandits in policy contexts.
Ethics, Fairness, and Robustness in Economic ML
- Topics: Algorithmic fairness, distributional shifts, interpretability for policy, transparency, replication and privacy concerns.
- Lab: Audit models for fairness; stress-test models under simulated shifts.
- Readings: Kleinberg et al. (fairness), recent survey articles.
Presentations & Final Project Symposium
- Activities: Project presentations, discussion of limitations, replication files, peer feedback.
- Deliverables: Final report, code repository, short presentation.
Assignments & labs
- Weekly labs tied to lectures (code + short write-up).
- Midterm: theoretical + applied take-home.
- Final project: original empirical question applying ML to economic data; milestones: proposal (week 4), mid-project check (week 8), final deliverable including code and reproducible analysis.
Project ideas
- Predicting consumer credit default and designing targeting rules.
- Estimating heterogeneous treatment effects of a job training program.
- Forecasting GDP and evaluating ML vs classical time-series models.
- Using RL/contextual bandits to optimize unemployment benefit outreach (offline evaluation).
- Price/demand estimation for a product using ML for substitution patterns.
– Detecting tax evasion or fraud with supervised learning, with careful evaluation of fairness.
Suggested Datasets
- CPS (Current Population Survey), American Community Survey (IPUMS), FRED macro series, World Bank indicators, Penn World Table, Compustat/CRSP (finance), Kaggle economic datasets, randomized control trial datasets (e.g., J-PAL).
Core readings & Resources
- Mullainathan, S., & Spiess, J. (2017). Machine Learning: An Applied Econometric Approach.
- Varian, H. (2014). Big Data: New Tricks for Econometrics.
- Chernozhukov, V., et al. (2018). Double/Debiased Machine Learning for Treatment and Structural Parameters.
- Wager, S., & Athey, S. (2018). Estimation and Inference of Heterogeneous Treatment Effects using Random Forests.
- Hastie, T., Tibshirani, R., & Friedman, J. (Elements/Intro to Statistical Learning) for ML foundations.
- Hyndman, R. J. & Athanasopoulos, G. (Forecasting: Principles and Practice).
- Angrist, J., & Pischke, J. (Mostly Harmless Econometrics) — for causal inference fundamentals.
- Molnar, C. (Interpretable Machine Learning) — interpretability tools.