Training Calendar

Progrès dans l'inférence causale et l'évaluation des programmes à l'aide de Stata (présenté en Anglais)

Online 2 days (28th May 2020 - 29th May 2020) Stata Advanced, Intermediate
Delivered by: Dr. Melvyn Weeks


This course will review the application of machine learning techniques to both prediction problems and problems where a policy maker needs to understand the impact of a policy on a heterogeneous population. We will make the distinction between the ex post assessment of a policy change and the ex ante identification of characteristics of agents that are predictive of the likely impact of a policy.

This course will cover topics at the intersection of machine learning and econometrics.

The course covers a mix of theory and applications. Using Breiman’s (2001) notion of two cultures in the use of statistical modelling, we start with a review of the fundamental differences between machine learning and econometrics. In this context we contrast a modelling approach where the analyst makes certain assumption on model specification, including functional form, with an approach where the data mechanism is presumed unknown. In this context we consider the econometrician’s concern for internal and external validity, alongside the focus within machine learning of ensuring that a model is robust in the sense of generalising to unseen data. We also examine the distinction between models used to solve a prediction problem, and models which are used to estimate some form of causal effect.

As a point of departure we review the broad types of machine learning in terms of supervised and nonsupervised learning, making the link to nonparametric regression. We then consider a number of fundamental building blocks, starting with error decomposition in terms of bias and variance, the role of training, estimation and test samples, and consider the role of regularization as a means to avoid overfitting.

In covering the three broad areas where machine learning is used, namely prediction, classification and causal effects, for each case we link the exposition to a parametric benchmark. So for prediction we consider the piecewise nonlinear regression model, for classification we review the fundamentals of parametric binary choice models, and for causal effects we look at specification of models of treatment effects.

We will also cover the use of ensemble methods as an averaging and regularization device. In this context we will explore a number of general methods for model averaging including bootstrap sampling (so-called bagging) and random forests.

For Machine Learning models in prediction, classification and causal effects we provide examples using Stata and Python. We also demonstrate the integration of Python code in Stata.

Agenda, Course Timetable and Recommended Readings

Topic 1: Policy Making and Machine Learning

  • Data for Policy Makers:
    • Randomised and Observation
  • Causal Inference
  • Policy Evaluation versus Policy Prediction
  • Heterogeneous Effects of Pricing Schemes

Topic 2: Point of Departure I: The Ordinary Least Squares Estimator

  • The Conditional Expectation Function (CEF)
  • The Linear CEF (Regression) Model
  • Additive and Multiplicative Regression Models
  • Estimation and Inference with Big Data: N and K
  • Ensemble Learning and Model Selection

Topic 3: Machine Learning and Econometrics

  • What is Machine Learning
  • Contrast with Traditional Econometrics
  • What Economists Do
  • Prediction in a Stable Environment
  • Key Lessons for Econometrics

Topic 4a: Point of Departure II: Regularized Regression

  • Theoretical background
  • Ridge regression
  • Least absolute Shrinkage and Selection (lasso) method for linear regression
  • The Elastic Net Specification
  • Causal inference and lasso
  • Stata Packages:
    • lassopack - implements lasso, elastic net, ridge, square-root and adaptive lasso. Regularization using: cross-validation (cvlasso), information criteria (lasso2) and theory-driven penalization (rlasso).
    • pdslasso

Topic 4b: Applications of Regularized Regression

  • Impact of Education on Child Birth-weight
    • Cattaneo, M. D. 2010. Efficient semiparametric estimation of multi-valued treatment effects under ignorability. Journal of Econometrics 155: 138-154.
  • House Price Data

Topic 5: Machine Learning and Decision Trees

  • Machine Learning: Terminology and Concepts
  • An Overview of Regression Trees
  • The Bias-Variance Tradeoff
  • Training, Testing and Cross Validation
  • Regularization: Variance reduction and Ensemble Learning
  • Variable Importance
  • Choice of Software
    • Using the RandomForest module in Stata 16
    • Using the randomForest and Generalized Random Forest (grf) packages in R
    • Python integration with Stata 16
      • Invoke Python interactively
      • Embed Python code in a do-file
      • Run a Python script file within Stata
  • Application: House Prices

Topic 6: Point of Departure III: Machine Learning and Classification

  • Parametric Benchmarks
  • Classification and Probabilities
  • Application: Credit Card default and surviving the Titanic.

Topic 7: Point of Departure IV: Programme Evaluation and Treatment Effects

  • Overview
  • Types of Treatment Effects
  • Ignorability of Treatment
  • Endogenous Selection
  • Matching Estimators
  • The Difference-in-Difference Estimator

Topic 8a: Machine Learning for Policy Evaluation and Prediction

  • Causality and Inference
  • Economic Policy Evaluation
  • Prediction Policy Problems
  • Machine Learning and Heterogenous Treatment Effects
  • Application: Job Training Programs

Topic 8b: Machine Learning and Causal Inference

  • P-values Revisited: Causal Inference with whilst searching for Functional Form
  • Adaptive versus Honest Estimation
  • Causal Trees and Forests
  • Application: Time of Use Tariffs and Smart Meter Data

Daily Timetable (subject to minor changes)

TimeSession / Description
09:00-09:30 Arrival and Registration
09:30-11:00 Session 1
11:00-11:15 Break
11:15-12:45 Session 2
12:45-13:45 Lunch
13:45-15:15 Session 3
15:15-15:30 Break
15:30-17:15 Session 4

Principal texts for pre/post course reading

Machine Learning: Overview

  1. L. Breiman, J. Freidman, R. Olshen, C. Stone. Classification and Regression Trees. Klein-Verlag, 1990.
  2. Random Forests.
  3. Training, Validation, and Test sets.,_validation,_and_test_sets
  4. J. Freidman, T. Hastie, R. Tibshirani. The Elements of Statistical Learning. Springer, 2009.
  5. G. James, D. Witten, T. Hastie, R. Tibshirani. An Introduction to Statistical Learning with Applications in R. Springer, 2013.
  6. S. Russell, P. Norvig Artificial Intelligence: A Modern Approach 3rd edition, 2009

Machine Learning and Econometrics

  1. L. Breiman Statistical Modeling: The Two Cultures Statistical Science, Vol. 16, No. 3. pp. 199-215
  2. S. Athey The Impact of Machine Learning on Economics, The Economics of Artificial Intelligence: An Agenda, 2018.
  3. National Bureau of Economic Research. See
  4. S. Athey,G. Imbens Machine Learning Methods Economists Should Know About. Working Paper, 2019, Graduate School of Business, Stanford University.
  5. S. Mullainathan, J. Spiess. Machine Learning: An Applied Econometric Approach Journal of Economic Perspectives vol. 31, 2017, pp. 87-106.

Machine Learning: Causal Effects and Random Forests

  1. S. Athey, G. Imbens. Recursive partitioning for heterogeneous causal effects Proceedings of the National Academy of Sciences, 113(27):7353–7360, 2016.
  2. E. O’Neill, M. Weeks. Causal Tree Estimation of Heterogeneous Household Response to Time-Of-Use Electricity Pricing Schemes arXiv:1810.09179v1, 2018.
  3. S. Athey, G. Imbens, Y. Kong, V. Ramachandra. An Introduction to Recursive Partitioning for Heterogeneous Causal Effects Estimation Using causalTree package., 2016.
  4. S. Athey, S. Wager. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 2017.
  5. Y. Lin, J. Yongho Random forests and adaptive nearest neighbors Technical Report No. 1055. University of Wisconsin, 2002
  6. L. Breiman Random Forests Machine Learning, Vol. 45, pp.5-32
  7. Random Forests. forest.
  8. Training, Validation, and Test sets.,_validation,_and_test_sets

Selection and Treatment Effects

  1. Chib, S. and B.H. Hamilton, (2000), Bayesian analysis of cross-section and clustered data treatment models, Journal of Econometrics, 97(1), 25-50
  2. Munkin, M.K. and P.K. Trivedi, (2003), Bayesian analysis of a self-selection model with multiple outcomes using simulation-based estimation: An application to the demand for healthcare, Journal of Econometrics, 114(2), 197-220
  3. Li, M, and J. Tobias, (200*), Bayesian analysis of Treatment Effects in an Ordered Potential Outcomes Model, Advances in Econometrics


  1. Efron, B., Hastie, T., Johnstone, I. and R. Tibshirani, (2004). Least angle regression (with discussion), Annals of Statistics 32(2): 407- 499
  2. Friedman, Jerome; Hastie, Trevor; Tibshirani, Robert (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics) (Kindle Locations 13024-13026). Springer - A. Kindle Edition.
  3. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B., Vol. 58, No. 1, pages 267-288).
  4. Hofmarcher, P., Cuaresma, J., Grun, B., and K. Hornik (2015). Last Night a Shrinkage Saved My Life: Economic Growth, Model Uncertainty and Correlated Regressors, Journal of Forecasting, Vol. 34, pages 133-144
  5. Park, T., and Casella, G. (2008). The Bayesian lasso. Journal of the American Statistical Association, 103(482), 681-686.
  6. O’Hara, R. B., & Sillanpaa, M. J. (2009). A review of Bayesian variable selection methods: what, how and which. Bayesian analysis, 4(1), 85-117.
  7. Kyung, M., Gill, J., Ghosh, M., & Casella, G. (2010). Penalized regression, standard errors, and Bayesian lassos. Bayesian Analysis, 5(2), 369-411.
  8. Tibshirani, R. (2011). Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(3), 273-282.


  • It is required some knowledge of basic statistics and econometrics: notion of conditional expectation and related properties; point and interval estimation; regression model and related properties; probit and logit regression.
  • Basic knowledge of Stata software

Termes et conditions

  • Inscriptions des étudiants: Les participants doivent fournir une preuve de statut d'étudiant à temps plein au moment de la réservation pour se qualifier pour le taux d'inscription des étudiants (carte d'identité d'étudiant valide ou lettre d'inscription autorisée).
  • Des remises supplémentaires sont disponibles pour plusieurs inscriptions. Contactez-nous pour plus d'informations.
  • Le coût comprend tout le matériel de cours, le déjeuner et les rafraîchissements.
  • Des licences temporaires et limitées dans le temps pour les logiciels utilisés dans le cours seront fournies. Vous devez installer le logiciel fourni avant le début du cours. Alternativement, les ordinateurs portables peuvent être loués pour un montant de 20,00 $ par jour.
  • ISi vous avez besoin d'aide pour localiser un hôtel dans la région, veuillez-nous en informer au moment de l'inscription.
  • Le paiement intégral des frais de cours est requis avant la date de début du cours pour garantir votre place.
  • L'inscription se termine 5 jours civils avant le début du cours.

Annulations ou modifications de votre inscription

  • Frais de 100% retournés pour les annulations faites plus de 28 jours civils avant le début du cours.
  • Frais de 50% retournés pour les annulations faites 14 jours civils avant le début du cours.
  • Aucun frais de retour pour les annulations faites moins de 14 jours calendaires avant le début du cours.

Le nombre de places disponibles est limité. Veuillez-vous inscrire tôt pour garantir votre place.

  •  CommercialAcademicStudent
    28-29 mai (11h/13h & 15h/17h) (28/05/2020 - 29/05/2020)

All prices exclude VAT or local taxes where applicable.

* Champs obligatoires

0 €
Post your comment

Timberlake Consultants