WR87: Causal Inference
|11, 12, 13 January 2021||Tuition fee: € 995,-
In this course students gain an overview on the statistical techniques and research designs used by epidemiologists to estimate treatment effects from patient data.
We begin by reviewing the randomized controlled trial (RCT) design in which patients are assigned completely at random to two or more treatment and control groups. Statistical techniques for inference about treatment effects in the RCT are reviewed. We then introduce the Neyman-Rubin causal model (RCM) which postulates that each patient has as many potential outcomes as there are treatment options. Students gain insights into how in an RCT design only one potential outcome per patient is observed, while all of the others are missing. RCTs plan this missingness in a very clever way – the potential outcomes are missing completely at random because no patient characteristic can be associated with treatment assignment. It is then introduced how the potential outcome framework links to those statistical inference techniques that students already have learnt about in previous statistics classes.
Subsequently, we will consider estimation of treatment effects in settings when treatment assignment cannot be controlled by the experimenter. This setting is referred to as an observational design and emerges in epidemiology usually in the form of cross-sectional retrospective as well as prospective studies. The key difference compared to RCTs is that potential outcomes are not missing completely at random anymore. However, sometimes the reasons for treatment assignment have been observed (in the data) in the form of so-called confounding variables. When this is true the causal inference literature offers an immense spectrum of statistical techniques for validly estimating treatment effects even outside of RCTs. We will first study the bias that emerges when the observed confounder assumption is not true – how large can it be? Subsequently, we review and apply a core set of estimation techniques that epidemiologists find useful for estimating treatment effects.
We start by reviewing and applying covariate based regression adjustment focusing on the assumptions and circumstances under which this technique has good performance (e.g. linearity and covariate balance). In particular, we consider examples where linear regression miserably fails to correctly estimate treatment effects. Students gain awareness how dangerous it can be to blindly use regression adjustment for treatment effect estimation. As alternative to regression adjustment we then consider alternative estimation techniques starting with the important class of propensity score adjustment techniques. Here we will learn how to apply propensity score stratification, weighting, and matching for estimating treatment effects. These techniques share the advantage that the relationship between confounders and outcome variables does not need to be known or modeled correctly. Instead the relationship with treatment assignment is modeled and small errors in model specification are alleviated by matching or stratification. All propensity score techniques are thoroughly practiced in order to enable students to apply them on own research problems. We also look at the covariate balancing assumptions and how to assess it. After considering propensity score techniques we briefly move on to so-called double robust estimation of treatment effects. These estimators combine propensity score weighting with regression adjustment and in many settings can give researchers the best of both worlds.
It is the goal of this course to enable students to use all techniques but also be aware of the underlying assumptions that allow their use or forbid it. Key to covariate based adjustment is always the assumption that all confounding variables are observed. In the end of this course we therefore consider alternative settings when this is not the case. Then so-called instrumental variables can still save the epidemiologist. However these variables are seldom observed in epidemiological studies.
Finally we summarize and discuss the implications for planning and designing epidemiological research. When RCTs cannot be used it is important to plan observational studies with the causal analysis already in mind. Epidemiologist need to collect sufficient information to allow the assumption of observed confounding or they need to make available an instrument. Both approaches are non-trivial and require care in preparation of causal studies.
Thomas Klaus, PhD, course coordinator
Department of Epidemiology and Data Science. Amsterdam UMC, location VUmc
Gabrielle Jongeneel, MSc
Department of Epidemiology & Biostatistics. Amsterdam UMC, location VUmc
The course consists of 3 days.
Topics day 1:
Randomized clinical trials (RCTs)
Neyman-Rubin causal model (RCM) and potential outcomes
Standard inference in RCTs under the RCM
Bias of standard estimators in observational studies
Regression adjustment and its assumptions (the outcome model)
Computerpracticals in R
Topics day 2:
The treatment assignment model
Propensity score weighting
Propensity score stratification
Propensity score matching
Computerpracticals in R
Topics day 3:
Double-robust estimation using outcome and treatment assignment models
Instrumental variable estimation
Design of observational studies
At the end of the course “causal inference” the participant will be able to:
1.determine whether an observational study design is adequate for answering a research question and plan the observational design to allow valid causal inference
2.explain and handle the problem of missing potential outcomes and confounding under the Neyman-Rubin causal model in RCTs and observational studies
3.explain the potentials and perils of simple regression adjustment for confounding
4.use statistical techniques in R to adjust for confounding, in particular propensity score weighting, stratification, and matching
5.explain how to use double robust estimation techniques and instrumental variables in estimation.
Target group and entry requirements
Target group: epidemiologists concerned with estimation of treatments effects between study groups using observational data.
Participants are expected
1. to have basic knowledge of epidemiological methods and have followed the Regression Techniques (V30) course of EpidM or expected to have at least knowledge of the following topics:
Knowledge of expectations and means, variances;
Statistical testing, confidence intervals,
Linear and logistic regression modeling.
2. to have basic knowledge of R;
if you don’thave basic knowledge of R, the participant can follow an online module Introduction R, which will be provided on Canvas before the course
Course material and literature
To be announced
Exam and accreditation
To be announced