2Chapter 2: Introduction to Causal Effect over Time

draft

1 Time-Indexed ITE and ATE

In chapter one, I introduced Rubin's Potential Outcomes Framework, which is a general framework for causal inference (Rubin & Holland, 1986). The core of causal effect is based on the difference between two potential outcomes Yi(1)Y_i(1) and Yi(0)Y_i(0), e.g., ITEi=Yi(1)Yi(0)ITE_i=Y_i(1)-Y_i(0), which is essentially a unit-treatment structure with no explicit constraints on time. This framework itself does not depend on the specific data structure, but in practice it is often applied to cross-sectional data, e.g., the treatment state is fixed in time and the outcome is only observed at a certain point in time.

Now I try to clarify the difference between definition and estimation of individual treatment effect. Causal effects are defined via counterfactuals and estimation strategies approximate them using oberved outcomes under assumptions (Holland, 1986; Angrist & Pishcke, 2009; Imbens & Rubin, 2015).

When ITEi=Yi(1)Yi(0)ITE_i=Y_i(1)-Y_i(0) is extended to data with a time structure, the expression form should be

ITEi=ITEi,t1=Yi,t1(1)Yi,t1(0)ITE_{i} = ITE_{i,t_1} = Y_{i,t_1}(1) - Y_{i,t_1}(0)

In this form, the time index t1t_1 does not merely refer to the observed time points in the data, but should be understood as "the theoretical event time at which treatment occurs".

To connect the theoretical potential outcomes model to empirical data, we define the observed outcome as:

Yi,t1obs=Di,t1Yi,t1(1)+(1Di,t1)Yi,t1(0)Y_{i, t_1}^{\text{obs}} = D_{i,t_1} \cdot Y_{i,t_1}(1) + (1 - D_{i,t_1}) \cdot Y_{i,t_1}(0)

Similarly, the time-specific ATE are defined as:

ATEt1=E[Yi,t1(1)Yi,t1(0)]=E[Yi,t1(1)]E[Yi,t1(0)]ATE_{t_1}= \mathbb{E}[Y_{i,t_1}(1) - Y_{i,t_1}(0)] = \mathbb{E}[Y_{i,t_1}(1)] - \mathbb{E}[Y_{i,t_1}(0)]

2 Estimation of Time-Specific ITE and ATE Using Observed Outcomes

As individual treatment effect ITEi=Yi(1)Yi(0)ITE_i=Y_i(1)-Y_i(0) is unobservable, we use a reference individual's observed outcomes instead of unobervable outcome of targeted individual, e.g., ITE^i=Yi(1)Yj(0)\widehat{ITE}_{i}=Y_i(1)-Y_j(0) or ITE^i=Yj(1)Yi(0)\widehat{ITE}_{i}=Y_j(1)-Y_i(0). Yet this substitution presumes that individuals ii and j j are comparable in all relevant respects, an assumption that is rarely tenable in practice. The potential outcome estimation is thus vulnerable to systematic error arising from unobserved heterogeneity. While panel data structures provide a powerful tool for identifying individual causal effects: by tracking changes in the same unit at different points in time, we can approximate its counterfactual state under certain conditions, which brings us closer to identifying ITE.

In this chapter, we will no longer repeat the general logic in cross-sectional analysis, but focus on the time-indexed panel data structure. The feature of this section, I think, is that I distinguished between ITE estimators and ATE estimators instead of merely based on the classical distinction, e.g., within group estimators, between group estimators and DID estimators.

2.1 Within Estimators For Time-Specific ITE

Suppose a unit ii receves a treatment at time point t1t_1, at which Di,t1=1D_{i, t_1}=1, and observation before treatment occurs at time point t0t_0, at which ii does not recieve a treatment Di,t0=0D_{i, t_0}=0. The ITE for treated individual to estimate can be fomulated as ITEi,t1=Yi,t1(1)Yi,t1(0)ITE_{i,t_1}= Y_{i,t_1}(1) - Y_{i,t_1}(0).

We can utilize the observed outcomes of this unit in the two periods to construct the following difference:

ITE^i,t1within=Yi,t1(1)Yi,t0(0)\widehat{ITE}_{i,t_1}^{within} = Y_{i,t_1}(1) - Y_{i,t_0}(0)

Notablely, t0<t1t_0<t_1.

ITE^i,t1within=Yi,t1(1)Yi,t1(0)+Yi,t1(0)Yi,t0(0)=ITEi,t1+Yi,t1(0)Yi,t0(0)\widehat{ITE}_{i,t_1}^{within} = Y_{i,t_1}(1)- Y_{i,t_1}(0)+Y_{i,t_1}(0)- Y_{i,t_0}(0)=ITE_{i,t_1}+Y_{i,t_1}(0)-Y_{i,t_0}(0)

The meaning of the term in the end Yi,t1(0)Yi,t0(0)Y_{i,t_1}(0)-Y_{i,t_0}(0) is so called counterfactual trend term, is how would the outcome variable evolve naturally between t0t_0 and t1t_1 if the unit ii had never been treated. It represents the counterfactual trend of the unit in the untreated state. If the counterfactual trend term is not zero, the estimator will systematically deviate from the true treatment effect. To make the estimates recognizable, we need to introduce the counterfactual stationarity assumption (No Time-Varying Confounding) : In the untreated state, the outcome variable of the unit tt has no significant change between t0t_0 and t1t_1, e.g., Yi,t1(0)Yi,t0(0)=0Y_{i,t_1}(0)-Y_{i,t_0}(0)=0. Whe tihs assumption holds true, intra individual difference can be regarded as consistent estimator for ITEi,t1ITE_{i,t_1}. Conversely, if the unit is affected by other time-varying factors (such as age increase, external shocks, seasonal trends, etc.), the estimator will be biased.

For example, we consider the effect of attending university on initial wage, if the individual participates in vocational training after failed to attend university because of lucky chance, then Yi,t1(0)Yi,t0(0)>0Y_{i,t_1}(0)-Y_{i,t_0}(0)>0, we may overestimate the ITE; if the individual gives up on himself and leads a meaningless life because of family crisis, the Yi,t1(0)Yi,t0(0)<0Y_{i,t_1}(0)-Y_{i,t_0}(0)<0, we may underestimate the ITE. Here I use external shocks to illustrate how internal trend will be changed.

2.2 Within Estimators For Time-Specific ATT

Under Rubin potential outcomes framework, if we only have two periods of panel data and only the treatment group is exposed to a treatment form 0 to a at t1t_1 while the control group remains as 0. Then, with within-individual differences, only the colsed form of ATT can be fomulated, as ATU requires an observation or can be substituted for the outcome of the untreated group in the treated state Yj,t1(1)Y_{j,t_1}(1), an observation does not exist inth within group difference. ATE is a weigthted average of ATT and ATU, since ATU is unidentifiable under classical Rubin model, ATE cannot be expresed closed.

If we observe that the treatment states of multiple individuals in a set of units (or treatment groups) change over the same time window, the ATT to estimate can be fomulated as follows:

ATTt=E[Yi,t1(1)Yi,t1(0)Di,t1=1],ATT_t = \mathbb{E}[Y_{i,t_1}(1) - Y_{i,t_1}(0) \mid D_{i,t_1} = 1],

We can apply the above difference method to all eligible units 𝑖𝑖 and take the average of the outcomes to obtain what is called an intra-group estimator:

ATT^twithin=1NiG(Yi,t1(1)Yi,t0(0))\widehat{ATT}_t^{within} = \frac{1}{N} \sum_{i \in G} (Y_{i,t_1}(1) - Y_{i,t_0}(0))

Then we can decompose this estimator into the following two parts:

ATT^twithin=1NiG(Yi,t1(1)Yi,t1(0))True ATT+1NiG(Yi,t1(0)Yi,t0(0))Bias.\widehat{ATT}_t^{within} = \underbrace{\frac{1}{N} \sum_{i \in G} (Y_{i,t_1}(1) - Y_{i,t_1}(0))}_{\text{True ATT}} + \underbrace{\frac{1}{N} \sum_{i \in G} (Y_{i,t_1}(0) - Y_{i,t_0}(0))}_{\text{Bias}}.

Therefore, the consistency of the within group estimator depends on whether the second term (counterfactual trend term) is zero. Note that, different from identification requirements of ITE, every unit does not need to be zero on the counterfactual trend term in order for the estimator to consistently idendify the ATT. As long as the bias of these units cancel each other out in general, e.g., to satisfy the so-called zero mean bias assumption or no systematic trend bias:

E[Yi,t1(0)Yi,t1(0)Di,t1=1]=0\mathbb{E}[Y_{i,t_1}(0) - Y_{i,t_1}(0) \mid D_{i,t_1} = 1]=0

2.3 Between Estimators for Time-Specific ITE

Assume the unit ii receives treatment in time t1t_1 (Di,t1=1D_{i,t_1} = 1), and unit jj does not receive treatment at the same time (Dj,t1=0D_{j,t_1}=0), then the time indexed form can be fomulated as ITE^i,t1=Yi,t1(1)Yj,t1(0)\widehat{ITE}_{i,t1}=Y_{i,t_1}(1)-Y_{j,t_1}(0), the same as between individual estimator for ITE for treated ITE^i=Yi(1)Yj(0)\widehat{ITE}_{i}=Y_i(1)-Y_j(0).

ITE^i,t1between=Yi,t1(1)Yj,t1(0)=Yi,t1(1)Yi,t1(0)ITEi,t1+Yi,t1(0)Yj,t1(0)bias term\widehat{ITE}_{i,t_1}^{\text{between}} = Y_{i,t_1}(1) - Y_{j,t_1}(0) = \underbrace{Y_{i,t_1}(1) - Y_{i,t_1}(0)}_{ITE_{i,t_1}} + \underbrace{Y_{i,t_1}(0) - Y_{j,t_1}(0)}_{\text{bias term}}

The assumption Yi,t1(0)Yj,t1(0)=0Y_{i,t_1}(0)-Y_{j,t_1}(0)=0 is so-called Assumption of Comparable Counterfactuals, which means if the potential outcomes of ii and jj in the untreated state have the same value, then the difference in observed outcomes can be used as a consistent estimator of ITE.

2.4 Between Estimators for Time-Specific ATE

Now we consider conparing the outcomes differences between different units as the same time point t1t_1, e.g., constructing an between estimator to estimate the a treatment effect.

ATT^tbetween=1NiG(Yi,t1(1)Yj,t1(0))\widehat{ATT}_t^{between} = \frac{1}{N} \sum_{i \in G} (Y_{i,t_1}(1) - Y_{j,t_1}(0))

Bias decomposition:

ATT^tbetween=1NiG(Yi,t1(1)Yi,t1(0))True ATT+1NiG(Yi,t1(0)Yj,t1(0))Bias\widehat{ATT}_t^{\text{between}} = \underbrace{\frac{1}{N} \sum_{i \in G} \left( Y_{i,t_1}(1) - Y_{i,t_1}(0) \right)}_{\text{True ATT}} + \underbrace{\frac{1}{N} \sum_{i \in G} \left( Y_{i,t_1}(0) - Y_{j,t_1}(0) \right)}_{\text{Bias}}

Assumption E[Yi,t1(0)Di=1]=E[Yj,t1(0)Dj=0]\mathbb{E}[Y_{i,t_1}(0) \mid D_i = 1] = \mathbb{E}[Y_{j,t_1}(0) \mid D_j = 0] is called mean ignorability for untreated potential outcome.

Estimator for ATU:

ATU^tbetween=1NjG(Yj,t1(1)Yi,t1(0))\widehat{ATU}_t^{\text{between}} = \frac{1}{N} \sum_{j\in G'} \left( Y_{j,t_1}(1) - Y_{i,t_1}(0) \right)

Bias decomposition:

ATU^tbetween=1NjG(Yi,t1(1)Yi,t1(0))True ATU+1NjG(Yj,t1(1)Yi,t1(1))Bias\widehat{ATU}_t^{\text{between}} = \underbrace{\frac{1}{N} \sum_{j \in G'} \left( Y_{i,t_1}(1) - Y_{i,t_1}(0) \right)}_{\text{True ATU}} + \underbrace{\frac{1}{N} \sum_{j\in G'} \left( Y_{j,t_1}(1) - Y_{i,t_1}(1) \right)}_{\text{Bias}}

Assumption E[Yj,t1(1)Dj=0]=E[Yi,t1(1)Di=1]\mathbb{E}[Y_{j,t_1}(1) \mid D_j = 0] = \mathbb{E}[Y_{i,t_1}(1) \mid D_i = 1] is called mean ignorability for treated potential outcome.

Both of the above estimators contain a truth term and bias term. The bias term stems from our substitution of observed outcomes of another group as counterfactual outcomes. In order to make two estimators for ATT and ATU consistent, we need to assume that these two groups have the same average outcome in the counterfactual state.

For ATE's estimator and it's bias decomposition, you can derivate them by yourself : ).

2.5 Difference-in-Differences Estimators for Time-Specific ITE

In causal inference, Difference-in-Differences DID is a classical identification strategy, which is suitable for identifing TE under time indexes in pannel data and repeated cross-sectional data. The basic idea idea is: By comparing the changes of the treatment group and the control group before and after treatment, the common trends and individual fixed heterogeneity can be eliminated.

Consider the following settings: unit iGi∈G, belongs to treatment group, is treated at point of time t1t_1 (Di,t1=1,Di,t0=0D_{i,t_1}=1, D_{i,t_0}=0 ); unit jGj∈G', belongs to control group, untreated from timepoint t0t_0 to timepointt1t_1 (Dj,t1=0,Dj,t0=0D_{j,t_1}=0, D_{j,t_0}=0 ); we can observe the following four outcomes:

Time
Treatment Group
Control Group

t0t_0

Yi,t0(0)Y_{i,t_0}(0)

Yj,t0(0)Y_{j,t_0}(0)

t1t_1

Yi,t1(1)Y_{i,t_1}(1)

Yj,t1(0)Y_{j,t_1}(0)

At the individual level, we can construct the following estimator to identify the time-specific treatment effect of the unit ii

ITE^iDID=(Yi,t1Yi,t0)(Yj,t1Yj,t0)\widehat{ITE}_i^{\text{DID}} =\left( Y_{i,t_1} - Y_{i,t_0} \right) - \left( Y_{j,t_1} - Y_{j,t_0} \right)

Bias decomposition:

ITE^iDID=Yi,t1(1)Yi,t1(0)True ITE+Yi,t0(0)Yj,t0(0)(Yi,t1(0)Yj,t1(0))Bias term\widehat{ITE}_i^{\text{DID}} = \underbrace{Y_{i,t_1}(1) - Y_{i,t_1}(0)}_{\text{True ITE}} + \underbrace{Y_{i,t_0}(0) - Y_{j,t_0}(0) - \left( Y_{i,t_1}(0) - Y_{j,t_1}(0) \right)}_{\text{Bias term}}

The former term is the true treatment effect that we want to identify, and the latter term is the "counterfactual trend bias", this bias is observable. To make the bias term zero, the following assumption must be met:

(Yi,t1(0)Yj,t1(0))=(Yi,t0(0)Yj,t0(0))\left( Y_{i,t_1}(0) - Y_{j,t_1}(0) \right) = \left( Y_{i,t_0}(0) - Y_{j,t_0}(0) \right)

This means that in the untreated state, the individual difference between the treated and the untreated was stable at the two time points t0t_0 and t1t_1. This Assumption is called Parallel Trends Assumption.

2.6 Difference-in-Differences Estimators for Time-Specific ATT

In a 2×2 DID, unless additional reinforcing assumptions are applied, ATU and ATE cannot be identified in a closed form. I will in the following chapters introduce the solution obtained through the iterative methods.

ATTttrue=E[Yi,t1(1)Yi,t1(0)iG]ATT_t^{\text{true}} = \mathbb{E}[Y_{i,t_1}(1) - Y_{i,t_1}(0) \mid i \in G]

ATT^tDID=1NiG[(Yi,t1Yi,t0)(Yj,t1Yj,t0)]\widehat{ATT}_t^{\text{DID}} = \frac{1}{N} \sum_{i \in G} \left[ \left( Y_{i,t_1} - Y_{i,t_0} \right) - \left( Y_{j,t_1} - Y_{j,t_0} \right) \right]
ATT^tDID=1NiG(Yi,t1(1)Yi,t1(0))True ATT+[1NiG(Yi,t0(0)Yj,t0(0))1NiG(Yi,t1(0)Yj,t1(0))]Bias Term\widehat{ATT}t^{\text{DID}} = \underbrace{ \frac{1}{N} \sum_{i \in G} \left( Y_{i,t_1}(1) - Y_{i,t_1}(0) \right) }_{\text{True ATT}} + \underbrace{ \left[ \frac{1}{N} \sum_{i \in G} \left( Y_{i,t_0}(0) - Y_{j,t_0}(0) \right)-\frac{1}{N} \sum_{i \in G} \left( Y_{i,t_1}(0) - Y_{j,t_1}(0) \right) \right] }_{\text{Bias Term}}

ATEttrue=E[Yi,t1(1)Yi,t1(0)],iGGATE_t^{\text{true}} = \mathbb{E}[Y_{i,t_1}(1) - Y_{i,t_1}(0)], \quad i \in G \cup G'

ATUttrue=E[Yj,t1(1)Yj,t1(0)],jGATU_t^{\text{true}} = \mathbb{E}[Y_{j,t_1}(1) - Y_{j,t_1}(0)], \quad j \in G'

ATE^tDID=(YˉG,t1YˉG,t0)(YˉG,t1YˉG,t0)\widehat{ATE}_t^{\text{DID}} = (\bar Y_{G,t_1} - \bar Y_{G,t_0}) - (\bar Y_{G',t_1} - \bar Y_{G',t_0})

ATE^tDID=1NiGG(Yi,t1(1)Yi,t1(0))True ATE+[1NiGG(Yi,t1(0)Yi,t0(0))(YˉG,t1YˉG,t0YˉG,t1+YˉG,t0)]Bias\widehat{ATE}_t^{\text{DID}} = \underbrace{\frac{1}{N}\sum_{i\in G\cup G'} \left(Y_{i,t_1}(1) - Y_{i,t_1}(0)\right)}_{\text{True ATE}} \\ + \underbrace{ \left[\frac{1}{N}\sum_{i\in G\cup G'} \left( Y_{i,t_1}(0) - Y_{i,t_0}(0) \right) - \left( \bar Y_{G,t_1} - \bar Y_{G,t_0} - \bar Y_{G',t_1} + \bar Y_{G',t_0} \right) \right]}_{\text{Bias}}

ATU^tDID=(YˉG,t1YˉG,t0)(YˉG,t1YˉG,t0)\widehat{ATU}_t^{\text{DID}} = (\bar Y_{G',t_1} - \bar Y_{G',t_0}) - (\bar Y_{G,t_1} - \bar Y_{G,t_0})

ATU^tDID=1NGjG(Yj,t1(1)Yj,t1(0))True ATU+[1NGjG(Yj,t0(0)YG,t0(0))(Yj,t1(0)YG,t1(0))]Bias: Cross-group trend gap\widehat{ATU}_t^{\text{DID}} = \underbrace{\frac{1}{N_{G'}}\sum_{j\in G'} (Y_{j,t_1}(1) - Y_{j,t_1}(0))}_{\text{True ATU}} \\ + \underbrace{ \left[ \frac{1}{N_{G'}} \sum_{j\in G'} \bigl( Y_{j,t_0}(0) - Y_{G,t_0}(0) \bigr) - \bigl( Y_{j,t_1}(0) - Y_{G,t_1}(0) \bigr) \right]}_{\text{Bias: Cross-group trend gap}}

Last updated