Chapter 2: Introduction to Causal Effect over Time

draft

1 Time-Indexed ITE and ATE

In chapter one, I introduced Rubin's Potential Outcomes Framework, which is a general framework for causal inference (Rubin & Holland, 1986). The core of causal effect is based on the difference between two potential outcomes $Y_i(1)$ and $Y_i(0)$ , e.g., $ITE_i=Y_i(1)-Y_i(0)$ , which is essentially a unit-treatment structure with no explicit constraints on time. This framework itself does not depend on the specific data structure, but in practice it is often applied to cross-sectional data, e.g., the treatment state is fixed in time and the outcome is only observed at a certain point in time.

Now I try to clarify the difference between definition and estimation of individual treatment effect. Causal effects are defined via counterfactuals and estimation strategies approximate them using oberved outcomes under assumptions (Holland, 1986; Angrist & Pishcke, 2009; Imbens & Rubin, 2015).

When $ITE_i=Y_i(1)-Y_i(0)$ is extended to data with a time structure, the expression form should be

ITE_{i} = ITE_{i,t_1} = Y_{i,t_1}(1) - Y_{i,t_1}(0)

In this form, the time index $t_1$ does not merely refer to the observed time points in the data, but should be understood as "the theoretical event time at which treatment occurs".

To connect the theoretical potential outcomes model to empirical data, we define the observed outcome as:

Y_{i, t_1}^{\text{obs}} = D_{i,t_1} \cdot Y_{i,t_1}(1) + (1 - D_{i,t_1}) \cdot Y_{i,t_1}(0)

Similarly, the time-specific ATE are defined as:

ATE_{t_1}= \mathbb{E}[Y_{i,t_1}(1) - Y_{i,t_1}(0)] = \mathbb{E}[Y_{i,t_1}(1)] - \mathbb{E}[Y_{i,t_1}(0)]

2 Estimation of Time-Specific ITE and ATE Using Observed Outcomes

As individual treatment effect $ITE_i=Y_i(1)-Y_i(0)$ is unobservable, we use a reference individual's observed outcomes instead of unobervable outcome of targeted individual, e.g., $\widehat{ITE}_{i}=Y_i(1)-Y_j(0)$ or $\widehat{ITE}_{i}=Y_j(1)-Y_i(0)$ . Yet this substitution presumes that individuals $i$ and $j$ are comparable in all relevant respects, an assumption that is rarely tenable in practice. The potential outcome estimation is thus vulnerable to systematic error arising from unobserved heterogeneity. While panel data structures provide a powerful tool for identifying individual causal effects: by tracking changes in the same unit at different points in time, we can approximate its counterfactual state under certain conditions, which brings us closer to identifying ITE.

In this chapter, we will no longer repeat the general logic in cross-sectional analysis, but focus on the time-indexed panel data structure. The feature of this section, I think, is that I distinguished between ITE estimators and ATE estimators instead of merely based on the classical distinction, e.g., within group estimators, between group estimators and DID estimators.

2.1 Within Estimators For Time-Specific ITE

Suppose a unit $i$ receves a treatment at time point $t_1$ , at which $D_{i, t_1}=1$ , and observation before treatment occurs at time point $t_0$ , at which $i$ does not recieve a treatment $D_{i, t_0}=0$ . The ITE for treated individual to estimate can be fomulated as $ITE_{i,t_1}= Y_{i,t_1}(1) - Y_{i,t_1}(0)$ .

We can utilize the observed outcomes of this unit in the two periods to construct the following difference:

\widehat{ITE}_{i,t_1}^{within} = Y_{i,t_1}(1) - Y_{i,t_0}(0)

Notablely, $t_0<t_1$ .

\widehat{ITE}_{i,t_1}^{within} = Y_{i,t_1}(1)- Y_{i,t_1}(0)+Y_{i,t_1}(0)- Y_{i,t_0}(0)=ITE_{i,t_1}+Y_{i,t_1}(0)-Y_{i,t_0}(0)

The meaning of the term in the end $Y_{i,t_1}(0)-Y_{i,t_0}(0)$ is so called counterfactual trend term, is how would the outcome variable evolve naturally between $t_0$ and $t_1$ if the unit $i$ had never been treated. It represents the counterfactual trend of the unit in the untreated state. If the counterfactual trend term is not zero, the estimator will systematically deviate from the true treatment effect. To make the estimates recognizable, we need to introduce the counterfactual stationarity assumption (No Time-Varying Confounding) : In the untreated state, the outcome variable of the unit $t$ has no significant change between $t_0$ and $t_1$ , e.g., $Y_{i,t_1}(0)-Y_{i,t_0}(0)=0$ . Whe tihs assumption holds true, intra individual difference can be regarded as consistent estimator for $ITE_{i,t_1}$ . Conversely, if the unit is affected by other time-varying factors (such as age increase, external shocks, seasonal trends, etc.), the estimator will be biased.

For example, we consider the effect of attending university on initial wage, if the individual participates in vocational training after failed to attend university because of lucky chance, then $Y_{i,t_1}(0)-Y_{i,t_0}(0)>0$ , we may overestimate the ITE; if the individual gives up on himself and leads a meaningless life because of family crisis, the $Y_{i,t_1}(0)-Y_{i,t_0}(0)<0$ , we may underestimate the ITE. Here I use external shocks to illustrate how internal trend will be changed.

2.2 Within Estimators For Time-Specific ATT

Under Rubin potential outcomes framework, if we only have two periods of panel data and only the treatment group is exposed to a treatment form 0 to a at $t_1$ while the control group remains as 0. Then, with within-individual differences, only the colsed form of ATT can be fomulated, as ATU requires an observation or can be substituted for the outcome of the untreated group in the treated state $Y_{j,t_1}(1)$ , an observation does not exist inth within group difference. ATE is a weigthted average of ATT and ATU, since ATU is unidentifiable under classical Rubin model, ATE cannot be expresed closed.

If we observe that the treatment states of multiple individuals in a set of units (or treatment groups) change over the same time window, the ATT to estimate can be fomulated as follows:

ATT_t = \mathbb{E}[Y_{i,t_1}(1) - Y_{i,t_1}(0) \mid D_{i,t_1} = 1],

We can apply the above difference method to all eligible units $𝑖$ and take the average of the outcomes to obtain what is called an intra-group estimator:

\widehat{ATT}_t^{within} = \frac{1}{N} \sum_{i \in G} (Y_{i,t_1}(1) - Y_{i,t_0}(0))

Then we can decompose this estimator into the following two parts:

\widehat{ATT}_t^{within} = \underbrace{\frac{1}{N} \sum_{i \in G} (Y_{i,t_1}(1) - Y_{i,t_1}(0))}_{\text{True ATT}} + \underbrace{\frac{1}{N} \sum_{i \in G} (Y_{i,t_1}(0) - Y_{i,t_0}(0))}_{\text{Bias}}.

Therefore, the consistency of the within group estimator depends on whether the second term (counterfactual trend term) is zero. Note that, different from identification requirements of ITE, every unit does not need to be zero on the counterfactual trend term in order for the estimator to consistently idendify the ATT. As long as the bias of these units cancel each other out in general, e.g., to satisfy the so-called zero mean bias assumption or no systematic trend bias:

\mathbb{E}[Y_{i,t_1}(0) - Y_{i,t_1}(0) \mid D_{i,t_1} = 1]=0

2.3 Between Estimators for Time-Specific ITE

Assume the unit $i$ receives treatment in time $t_1$ ( $D_{i,t_1} = 1$ ), and unit $j$ does not receive treatment at the same time ( $D_{j,t_1}=0$ ), then the time indexed form can be fomulated as $\widehat{ITE}_{i,t1}=Y_{i,t_1}(1)-Y_{j,t_1}(0)$ , the same as between individual estimator for ITE for treated $\widehat{ITE}_{i}=Y_i(1)-Y_j(0)$ .

\widehat{ITE}_{i,t_1}^{\text{between}} = Y_{i,t_1}(1) - Y_{j,t_1}(0) = \underbrace{Y_{i,t_1}(1) - Y_{i,t_1}(0)}_{ITE_{i,t_1}} + \underbrace{Y_{i,t_1}(0) - Y_{j,t_1}(0)}_{\text{bias term}}

The assumption $Y_{i,t_1}(0)-Y_{j,t_1}(0)=0$ is so-called Assumption of Comparable Counterfactuals, which means if the potential outcomes of $i$ and $j$ in the untreated state have the same value, then the difference in observed outcomes can be used as a consistent estimator of ITE.

2.4 Between Estimators for Time-Specific ATE

Now we consider conparing the outcomes differences between different units as the same time point $t_1$ , e.g., constructing an between estimator to estimate the a treatment effect.

\widehat{ATT}_t^{between} = \frac{1}{N} \sum_{i \in G} (Y_{i,t_1}(1) - Y_{j,t_1}(0))

Bias decomposition:

\widehat{ATT}_t^{\text{between}} = \underbrace{\frac{1}{N} \sum_{i \in G} \left( Y_{i,t_1}(1) - Y_{i,t_1}(0) \right)}_{\text{True ATT}} + \underbrace{\frac{1}{N} \sum_{i \in G} \left( Y_{i,t_1}(0) - Y_{j,t_1}(0) \right)}_{\text{Bias}}

Assumption $\mathbb{E}[Y_{i,t_1}(0) \mid D_i = 1] = \mathbb{E}[Y_{j,t_1}(0) \mid D_j = 0]$ is called mean ignorability for untreated potential outcome.

Estimator for ATU:

\widehat{ATU}_t^{\text{between}} = \frac{1}{N} \sum_{j\in G'} \left( Y_{j,t_1}(1) - Y_{i,t_1}(0) \right)

Bias decomposition:

\widehat{ATU}_t^{\text{between}} = \underbrace{\frac{1}{N} \sum_{j \in G'} \left( Y_{i,t_1}(1) - Y_{i,t_1}(0) \right)}_{\text{True ATU}} + \underbrace{\frac{1}{N} \sum_{j\in G'} \left( Y_{j,t_1}(1) - Y_{i,t_1}(1) \right)}_{\text{Bias}}

Assumption $\mathbb{E}[Y_{j,t_1}(1) \mid D_j = 0] = \mathbb{E}[Y_{i,t_1}(1) \mid D_i = 1]$ is called mean ignorability for treated potential outcome.

Both of the above estimators contain a truth term and bias term. The bias term stems from our substitution of observed outcomes of another group as counterfactual outcomes. In order to make two estimators for ATT and ATU consistent, we need to assume that these two groups have the same average outcome in the counterfactual state.

For ATE's estimator and it's bias decomposition, you can derivate them by yourself : ).

2.5 Difference-in-Differences Estimators for Time-Specific ITE

In causal inference, Difference-in-Differences DID is a classical identification strategy, which is suitable for identifing TE under time indexes in pannel data and repeated cross-sectional data. The basic idea idea is: By comparing the changes of the treatment group and the control group before and after treatment, the common trends and individual fixed heterogeneity can be eliminated.

Consider the following settings: unit $i∈G$ , belongs to treatment group, is treated at point of time $t_1$ ( $D_{i,t_1}=1, D_{i,t_0}=0$ ); unit $j∈G'$ , belongs to control group, untreated from timepoint $t_0$ to timepoint $t_1$ ( $D_{j,t_1}=0, D_{j,t_0}=0$ ); we can observe the following four outcomes:

Time

Treatment Group

Control Group

$t_0$

$Y_{i,t_0}(0)$

$Y_{j,t_0}(0)$

$t_1$

$Y_{i,t_1}(1)$

$Y_{j,t_1}(0)$

At the individual level, we can construct the following estimator to identify the time-specific treatment effect of the unit $i$ ：

\widehat{ITE}_i^{\text{DID}} =\left( Y_{i,t_1} - Y_{i,t_0} \right) - \left( Y_{j,t_1} - Y_{j,t_0} \right)

Bias decomposition:

\widehat{ITE}_i^{\text{DID}} = \underbrace{Y_{i,t_1}(1) - Y_{i,t_1}(0)}_{\text{True ITE}} + \underbrace{Y_{i,t_0}(0) - Y_{j,t_0}(0) - \left( Y_{i,t_1}(0) - Y_{j,t_1}(0) \right)}_{\text{Bias term}}

The former term is the true treatment effect that we want to identify, and the latter term is the "counterfactual trend bias", this bias is observable. To make the bias term zero, the following assumption must be met:

\left( Y_{i,t_1}(0) - Y_{j,t_1}(0) \right) = \left( Y_{i,t_0}(0) - Y_{j,t_0}(0) \right)

This means that in the untreated state, the individual difference between the treated and the untreated was stable at the two time points $t_0$ and $t_1$ . This Assumption is called Parallel Trends Assumption.

2.6 Difference-in-Differences Estimators for Time-Specific ATT

In a 2×2 DID, unless additional reinforcing assumptions are applied, ATU and ATE cannot be identified in a closed form. I will in the following chapters introduce the solution obtained through the iterative methods.

ATT_t^{\text{true}} = \mathbb{E}[Y_{i,t_1}(1) - Y_{i,t_1}(0) \mid i \in G]

\widehat{ATT}_t^{\text{DID}} = \frac{1}{N} \sum_{i \in G} \left[ \left( Y_{i,t_1} - Y_{i,t_0} \right) - \left( Y_{j,t_1} - Y_{j,t_0} \right) \right]

\widehat{ATT}t^{\text{DID}} = \underbrace{ \frac{1}{N} \sum_{i \in G} \left( Y_{i,t_1}(1) - Y_{i,t_1}(0) \right) }_{\text{True ATT}} + \underbrace{ \left[ \frac{1}{N} \sum_{i \in G} \left( Y_{i,t_0}(0) - Y_{j,t_0}(0) \right)-\frac{1}{N} \sum_{i \in G} \left( Y_{i,t_1}(0) - Y_{j,t_1}(0) \right) \right] }_{\text{Bias Term}}

ATE_t^{\text{true}} = \mathbb{E}[Y_{i,t_1}(1) - Y_{i,t_1}(0)], \quad i \in G \cup G'

ATU_t^{\text{true}} = \mathbb{E}[Y_{j,t_1}(1) - Y_{j,t_1}(0)], \quad j \in G'

\widehat{ATE}_t^{\text{DID}} = (\bar Y_{G,t_1} - \bar Y_{G,t_0}) - (\bar Y_{G',t_1} - \bar Y_{G',t_0})

\widehat{ATE}_t^{\text{DID}} = \underbrace{\frac{1}{N}\sum_{i\in G\cup G'} \left(Y_{i,t_1}(1) - Y_{i,t_1}(0)\right)}_{\text{True ATE}} \\ + \underbrace{ \left[\frac{1}{N}\sum_{i\in G\cup G'} \left( Y_{i,t_1}(0) - Y_{i,t_0}(0) \right) - \left( \bar Y_{G,t_1} - \bar Y_{G,t_0} - \bar Y_{G',t_1} + \bar Y_{G',t_0} \right) \right]}_{\text{Bias}}

\widehat{ATU}_t^{\text{DID}} = (\bar Y_{G',t_1} - \bar Y_{G',t_0}) - (\bar Y_{G,t_1} - \bar Y_{G,t_0})

\widehat{ATU}_t^{\text{DID}} = \underbrace{\frac{1}{N_{G'}}\sum_{j\in G'} (Y_{j,t_1}(1) - Y_{j,t_1}(0))}_{\text{True ATU}} \\ + \underbrace{ \left[ \frac{1}{N_{G'}} \sum_{j\in G'} \bigl( Y_{j,t_0}(0) - Y_{G,t_0}(0) \bigr) - \bigl( Y_{j,t_1}(0) - Y_{G,t_1}(0) \bigr) \right]}_{\text{Bias: Cross-group trend gap}}

PreviousChapter 1: Introduction to General Causual Effect and Treatment Efffect NextChapter 3: Pooled Regression Model to Estimate Time Effect

Last updated 10 months ago

hashtag1 Time-Indexed ITE and ATE

hashtag2 Estimation of Time-Specific ITE and ATE Using Observed Outcomes

hashtag2.1 Within Estimators For Time-Specific ITE

hashtag2.2 Within Estimators For Time-Specific ATT

hashtag2.3 Between Estimators for Time-Specific ITE

hashtag2.4 Between Estimators for Time-Specific ATE

hashtag2.5 Difference-in-Differences Estimators for Time-Specific ITE

hashtag2.6 Difference-in-Differences Estimators for Time-Specific ATT

1 Time-Indexed ITE and ATE

2 Estimation of Time-Specific ITE and ATE Using Observed Outcomes

2.1 Within Estimators For Time-Specific ITE

2.2 Within Estimators For Time-Specific ATT

2.3 Between Estimators for Time-Specific ITE

2.4 Between Estimators for Time-Specific ATE

2.5 Difference-in-Differences Estimators for Time-Specific ITE

2.6 Difference-in-Differences Estimators for Time-Specific ATT