L11: Time Varying Treatment Part 2

class: center, middle, inverse, title-slide

.title[
# L11: Time Varying Treatment
Part 2
]
.author[
### Jean Morrison
]
.institute[
### University of Michigan
]
.date[
### 2023-02-22
(updated: 2023-03-05)
]

---

`$\newcommand{\ci}{\perp\!\!\!\perp}$`

## Lecture Outline

1. Iterated Conditional Expectation Estimators
1. G-Estimation for Time-Varying Treatments
1. Repeated Outcome Measures
---
## 1. Iterated Conditional Expectation Estimators

---
## Iterated Conditional Expectation (ICE)

- Another way to express the G-formula is as a nested expectation.

- Recall, for two time-points the G-formula is

`$$E[Y(\bar{a})] = \int_{l_0, l_1} E[Y \vert \bar{A}_1 = \bar{a}, L_0 = l_0, L_1 = l_1] dF_1(l_1 \vert A_0 = a_0, L_0 = l_0) dF_0(l_0)$$`
- Previously, we showed that we could derive the G-formula from sequential exchangeability as an expression of iterated expectations.

$$
`\begin{split}
&Y(a_0, a_1) \ci A_1 \vert A_0 = a_0, L_0, L_1\\
&Y(a_0, a_1) \ci A_0 \vert L_0
\end{split}`
$$
---

## Iterated Conditional Expectation (ICE)

$$
`\begin{split}
&Y(a_0, a_1) \ci A_1 \vert A_0 = a_0, L_0, L_1\\
&Y(a_0, a_1) \ci A_0 \vert L_0
\end{split}`
$$
- From the second relation, we determine that
$$E[Y(\bar{a})] = \color{purple}{E_{L_0}\bigg[ } E[Y(a_1) \vert A_0 = a_0, L_0] \color{purple}{\bigg]} $$
- From the first relation, we have

`$$E[Y(a_1) \vert A_0 = a_0, L_0]  = \color{blue}{E_{L_1 \vert A_0 = a_0, L_0}\Big[} \color{green}{E\big[\color{red}{Y} \vert \bar{A}_1 = \bar{a}, L_0, L_1\big] } \color{blue}{\Big \vert A_0 = a_0, L_0\Big]}$$`
- Putting these parts together,
`$$E[Y(\bar{a})] = \color{purple}{E_{L_0}\bigg[ }\color{blue}{E_{L_1 \vert A_0 = a_0, L_0}\Big[} \color{green}{E\big[\color{red}{Y} \vert \bar{A}_1 = \bar{a}, L_0, L_1\big] } \color{blue}{\Big \vert A_0 = a_0, L_0\Big]} \color{purple}{\bigg]}$$`

---
## ICE Estimation

- The idea of ICE estimation is to fit models for each expectation iteratively.

- The first step is to define `$\hat{Q}_3 = Y$` and plug this into the G-formula expression.

`$$E[Y(\bar{a})] = \color{purple}{E_{L_0}\bigg[ }\color{blue}{E_{L_1 \vert A_0 = a_0, L_0}\Big[} \color{green}{E\big[\color{red}{\hat{Q}_3} \vert \bar{A}_1 = \bar{a}, L_0, L_1\big] } \color{blue}{\Big \vert A_0 = a_0, L_0\Big]} \color{purple}{\bigg]}$$`

- Next, define a model for `$\hat{Q}_2(L_1, L_0) = \hat{E}[\hat{Q}_3 \vert \bar{A}_1 = \bar{a}, \bar{L}_1]$`. For example,

`$$E[\hat{Q}_3 \vert \bar{L}_{1}, \bar{A}_{1} = \bar{a}] = \theta_{0, 1} +  \theta_{1,1} L_{1} + \theta_{2,1}L_0$$`
- We can fit this as a regular regression fit only among those with `$\bar{A}_1 = \bar{a}$`.

- Think of `$\hat{Q}_2$` as an estimate of `$E[Y(a_1) \vert A_0 = a_0, L_0, L_1]$`.

---
## ICE Estimation

- Now we plug `$\hat{Q}_2(L_1, L_0)$` into the expression

`$$E[Y(a_0, a_1)] = \color{purple}{E_{L_0}\bigg[ }\color{blue}{E_{L_1 \vert A_0 = a_0, L_0}\Big[} \color{green}{\hat{Q}_2(L_0, L_1)} \color{blue}{\Big \vert A_0 = a_0, L_0\Big]} \color{purple}{\bigg]}$$`

- Now we estimate the blue part. This will be a regression model with `$\hat{Q}_2$` as the outcome and `$L_0$` as predictors fit amongst those with `$A_0 = a_0$`.

- To find the outcome, at this stage, we compute `$\hat{Q}_{2,i}$` for each person with `$A_{0,i} = a_0$` by plugging `$L_{1,i}$`, `$L_{0,i}$` into the fitted model from the previous stage.

---
## ICE Estimation

- Use these fitted values to fit a model for `$\hat{Q}_1(L_0) = E[\hat{Q}_{2} \vert A_0 = a_0, L_0]$`, for example

`$$\hat{Q}_1(L_0) = E[\hat{Q}_2 \vert L_{0}, A_{0} = a_0] = \theta_{0, 0} + \theta_{1,0} L_{0}$$`
- Fit this model using only those with `$A_0 = a_0$`.

- Plugging this into the formula, we have 
`$$E[Y(a_0, a_1)] = \color{purple}{E_{L_0}\bigg[ }\color{blue}{\hat{Q}_1(L_0)} \color{purple}{\bigg]}$$`

- Now we compute `$\hat{Q}_{1,i}$` for each person in the study by plugging `$L_{0,i}$` into the fitted model.

- Finally we estimate `$\hat{E}[Y(\bar{a})] = \frac{1}{N}\sum_{i =1}^N\hat{Q}_{1,i}$` using everyone in the sample.

---
## ICE Estimation General Procedure

- Define

`$$Q_{K+2} = Y\\\
Q_{K+1} = E[Y \vert  \bar{A}_K = \bar{a}_K,  \bar{L}_{K}]\\\
\vdots\\\
Q_m = E[Q_{m+1} \vert \bar{A}_m = \bar{a}_m, \bar{L}_{m-1} ]\\\
\vdots\\\
Q_0 = E[Q_1 \vert A_0 = a_0, L_0]$$`

- Propose models for `$E[Q_{m+1} \vert \bar{L}_{m-1}, \bar{A}_{m} = \bar{a}_m]$`.

- At each stage, fit the next model using fitted values from the previous model as the outcome. Fit each model only among those with `$\bar{A}_m = \bar{a}_m$`.

- When we get all the way down to 0, `$Q_0 = E[Q_1 \vert A_0 = a_0, L_0]$` is an estimator of `$Y(\bar{a})$`.

---
## ICE Estimation

- The ICE estimator will be valid if the outcome regression was correct at each time point.

- An advantage over the parametric G-formula approach is that we didn't have to estimate the joint distribution of `$L$`.

---
## Doubly Robust Estimator

- We can turn the ICE estimator into a doubly robust estimator by adding a special covariate based on the propensity score to each regression.

- Below is a variation of the Scharfstein et al "special covariate" regression strategy for estimating `$Y(a)$` that we saw previously:

1. Estimate `$P[A = a \vert L]$` and compute `$W^{A}_i  = \frac{1}{\hat{P}[A = A_i \vert L]}$`

2. Propose an outcome model `$E[Y \vert A = a, L] = m(A = a, L; \theta)$`

3. Fit the model `$E[Y_i \vert A_i, L_i] = m(A_i, L_i; \theta) + \phi W^{A}_i$`, only among those with `$A_i = a$`.

4. Compute the standardized estimate

`$$E[Y(a)] = \frac{1}{N}\sum_{i=1}^N \left(m(a, L_i; \hat{\theta}) + \hat{\phi}\frac{1}{\hat{P}[A = a \vert L_i]} \right)$$`
---
## Doubly Robust Estimator for Time Varying Treatment

- We will use the same strategy in combination with the ICE estimation strategy.

- Just as in IP weighting, we need to fit a model for `$f(A_k \vert \bar{A}_{k-1}, \bar{L}_k)$` at each time point.

- For each time point compute

`$$W^{\bar{A}_m}_i = \prod_{k=0}^m \frac{1}{\hat{f}(A_k = A_{k,i}\vert \bar{A}_{k-1,i}, \bar{L}_{k,i})}$$`
and

`$$W^{\bar{A}_m, a_m }_i =  \frac{W^{\bar{A}_{m-1}}_i}{f(A_m = a_m \vert \bar{A}_{i,m-1}, \bar{L}_{i,m})}$$`
---
## Doubly Robust Estimator for Time Varying Treatment

- In each regression step, we add `$\hat{W}^{\bar{A}_m}$` as a covariate in the regression.

- In the estimation step, we replace `$A_m$` with `$a_m$` *and* `$W^{\bar{A}_m}$` with `$W^{\bar{A}_m, a_m = 1}$`.

---
## `$K+1$` Robustness

Our estimator is valid if:

1) The `$f(A_k \vert \bar{A}_{k-1}, \bar{L}_k)$` is correct at all times

2) The outcome model is correct at all time points.

3) The treatment model is correct for times 0 to `$k$` and the outcome model is correct for times `$k+1$` to `$K$` for any `$k < K$`

---

## Other Multiply Robust Estimators

- Just as in the point treatment case, there are other multiply robust estimators for time-varying treatments.

- For example, there is a version of the AIPW estimator `$\hat{\Delta}_{DR}$` for multiple time-points. 
  + There is also an extension of this estimator that gives `$2^{K+1}$` robustness, meaning that only one of the treatment or outcome model must be correct at each time point. 
  + See Technical Point 21.4 in HR for more details.

---
## G-Estimation For Time-Varying Exposures

---
## G-Estimation

- Starting with the two time-point model, we use the same strategy that we used in L8, by writing a model for a difference in expected counterfactuals.

- This time we need one model for each time-point. For example, we could use

$$
`\begin{split}
&E[Y(a_0, 0) - Y(0, 0) &\vert A_0 = a_0, L_0= l_0] = \gamma_0(a_0; \beta_0) = \beta_0 a_0\\
&E[Y(a_0, a_1) - Y(a_0, 0) &\vert L_1(a_0) = l_1, L_0 = l_0, A_0 = a_0, A_1(a_0) = a_1] \\
&&= \gamma_1(a_0, a_1, l_1, l_0; \beta_1) \\
& &= a_1(\beta_{1,1} + \beta_{1,2}l_1 + \beta_{1,3}a_0 + \beta_{1,4}a_0l_1)
\end{split}`
$$

- The model above could be for our previous example which had no `$L_0$`.

---
## G-Estimation

- By consistency,

$$
`\begin{split}
E[Y(a_0, a_1) - Y(a_0, 0) &\vert L_1(a_0) = l_1, L_0= l_0, A_0 = a_0, A_1(a_0) = a_1] =\\
E[Y(a_0, a_1) - Y(a_0, 0) &\vert L_1 = l_1, L_0= l_0, A_0 = a_0, A_1 = a_1]
\end{split}`
$$

- By sequential exchangeability
$$
`\begin{split}
&E[Y(a_0, 0) - Y(0, 0) &\vert A_0 = a_0, L_0= l_0] = E[Y(a_0, 0) - Y(0, 0) \vert L_0= l_0]\\
&E[Y(a_0, a_1) - Y(a_0, 0) &\vert L_1 = l_1, L_0= l_0,  A_0 = a_0, A_1 = a_1] =\\
&& E[Y(a_0, a_1) - Y(a_0, 0) \vert L_1 = l_1, L_0= l_0, A_0 = a_0]
\end{split}`
$$
---
## G-Estimation

- Combining these results with our previous model we now have

$$
`\begin{split}
&E[Y(a_0, 0) - Y(0, 0) &\vert L_0= l_0] = \gamma_0(a_0; \beta_0) = \beta_0 a_0\\
&E[Y(a_0, a_1) - Y(a_0, 0) &\vert L_1 = l_1, L_0= l_0, A_0 = a_0, ] \\
&&= \gamma_1(a_0, a_1, l_1, l_0; \beta_1) \\
& &= a_1(\beta_{1,1} + \beta_{1,2}l_1 + \beta_{1,3}a_0 + \beta_{1,4}a_0l_1)
\end{split}`
$$
- Now we need to link this model to the data. We will use consistency!

- By consistency, `$Y_i(a_0, a_1) = Y_i$` if `$A_0 = a_0$` and `$A_1 = a_1$` so

$$
`\begin{split}
&E[Y_i(A_{0,i}, 0) \vert L_{1,i}, L_{0,i}, A_{0, i}= a_0, A_{1,i}] = Y_i - \gamma_1(\bar{A}_{i,1}, \bar{L}_{1,i}; \beta_1)\\
&E[Y_i(0, 0) \vert L_{0,i}, A_{0, i}] = E[Y_i(A_{0,i}, 0) \vert L_{0,i}, A_{0, i}] - \gamma_0(A_{0,i}, L_{0,i}; \beta_0)\\
\end{split}`
$$
- Note the top statement applies only to units with `$A_{0,i} = a_0$`.

- These are structural **nested** mean models because the left side of the top equation appears in the bottom equation.

---
## G-Estimation

- We can now define two "H" functions

$$
`\begin{split}
H_{1,i}(\psi^{\dagger}) = Y_i - \gamma_1(\bar{A}_{1,i}, \bar{L}_{1,i}; \psi^{\dagger})\\
H_{0,i}(\psi^{\dagger}) = H_{1,i}(\psi^\dagger) - \gamma_0(A_{0,i}, L_{0,i}; \psi^{\dagger})
\end{split}`
$$
- Here `$\psi^\dagger$` is a combined vector of parameters in both equations.

- Just as before, our goal is to find a value for `$\psi^\dagger$` that makes sequential exchangeability true.

---
## G-Estimation

- In the time-varying case, we will need models for `$E[A_k \vert \bar{A}_{k-1}, \bar{L}_k]$` for each time point.

- We want to find values of `$\psi^\dagger$` such that `$H_{k}$` and `$A_k$` are independent conditional on `$\bar{A}_{k-1}$` and `$\bar{L}_k$`.

- We have two options: Fit a model for `$E[A_{k} \vert \bar{A}_{k-1}, \bar{L}_k, H_k]$` including `$H_k$` as one or more linear covariates and find a value of `$\psi$` such that all coefficients for `$H_k$` are 0.

- Solve estimating equations. If `$\psi^\dagger$` is one dimensional, we have

`$$\sum_{k = 0}^K \sum_{i = 1}^N (A_{i,k} - \hat{E}[A_{i,k} \vert \bar{A}_{i, k-1}, \bar{L}_k ])H_{k,i}(\psi^\dagger) = 0$$`
---
## Considerations

- Remember, in G-Estimation we are working with structural nested mean models **not** structural marginal models, these models are conditional on covariates.

- This means we need to get the effect modification model right.

- In the time-varying case, we also need to think about effect modification by past treatment.

- We could quickly acquire many coefficients, so using the grid search method will become infeasible, we will have to use the estimating equations.

---
## Estimation

- Once we have estimated all the parameters of the structural nested mean models, we need to estimate `$E[Y(\bar{a})]$`.

- `$E[H_0(\psi^\dagger)] = E[Y(\bar{0})]$` at the true value of `$\psi^\dagger$`, so we can estimate `$E[Y(\bar{0})]$` as the sample average of `$H_{0, i}(\hat{\psi})$`.

- For other values, we can plug into our nested models.

- However, if the `$\gamma$` functions include covariates, we will need to simulate these covariates the same way we did for the parametric G-formula method.

---
## Estimation

- For the two time-point case, our two structural nested mean models are

$$
`\begin{split}
&E[Y(A_{0}, 0) &\vert L_{1}, L_{0}, A_{0}= a_0, A_{1}] = \\
&& E[Y(a_0, a_1) \vert L_1, L_0, A_0 = a_0, A_1] - \gamma_1(\bar{A}_1, \bar{L}_{1}; \beta_1)\\
&E[Y(0, 0) &\vert L_{0}, A_{0}] = E[Y(A_{0}, 0) \vert L_{0}, A_{0}] - \gamma_0(A_{0}, L_{0}; \beta_0)\\
\end{split}`
$$

- From the second equation we see that

$$
`\begin{split}
E[Y(a_0, 0)] &= E[ E[Y(0, 0) \vert L_0, A_0] - \gamma_0(a_0, L_0; \beta_0)]\\
&= E[Y(0, 0)] - E[\gamma_0(a_0, L_0; \beta_0)]
\end{split}`
$$
- So we could calculate the expected counterfactual value of `$Y$` for any treatment at the first time if we could calculate `$E[\gamma_0(a_0, L_0; \hat{\psi})]$` with `$\hat{\psi}$` being the G-estimation parameter solution.

---
## Estimation

- Extrapolating this reasoning, we find that

`$$E[Y(\bar{a})] = \frac{1}{N}\sum_{i = 1}^N \hat{H}_{0,i} - \sum_{k = 0}^{K} E[\gamma_k(\bar{a}_k, \bar{L}_k; \hat{\psi})]$$`
- To  compute `$E[\gamma_k(\bar{a}_k, \bar{L}_k; \hat{\psi}]$`, we will need to use Monte Carlo simulations, like we did in G-computation. 
  + Unless the `$\gamma$` functions don't depend on `$L$`, in which case we don't need to simulate covariates and can calculate the estimate by standardization.

---
## Estimation

- Step 1: Fit models for the distribution of covariates at each time, `$f(l_k \vert \bar{a}_{k-1}, \bar{l}_{k-1})$`.

- Step 2: Simulate synthetic data under the intervention of interest (just like in G-computation).

- Step 3: Estimate `$E[\gamma_k(\bar{a}_k, \bar{L}_k; \hat{\psi})]$` as the sample average in the synthetic data.

- Step 4: Plug these estimates into the formula from the previous slide to obtain,

`$$E[Y(\bar{a})] = \frac{1}{N}\sum_{i = 1}^N \hat{H}_{0,i} - \sum_{k = 0}^{K} \hat{E}[\gamma_k(\bar{a}_k, \bar{L}_k; \hat{\psi})]$$`
---
## Repeated Outcome Measurements

---
## Repeated Outcome Measurements

- Until now, we have considered:

1. Point interventions and single outcomes. 
1. Point interventions and survival outcomes. 
1. Time-varying interventions and single outcomes.

- Now we will consider the case when the outcome of interest is a value that varies over time. The treatment may also vary over time.

---

## Cole et al (2007)

- Cole et al are interested in the effect of highly active antiretroviral therapy (HAART), a treatment for HIV, on viral load.

- They are using two longitudinal AIDS and HIV study's, the Multicenter AIDS Cohort Study (all men) and the Women's Interacency HIV Study (all women).

- Participants of both studies complete questionnaires and have blood work taken  every 6 months.

- HAART therapy became available in April 1996, so the study population is 918 individuals who were alive, HIV positive, and not using antiretroviral therapy at this time.

- Baseline visit: First visit after April 1996 or first visit after April 1996 with complete data (whichever is later).

---

## Challenges in Cole et al (2007)

- The technology used to measure HIV viral load changed over time and differed between study cohorts.

+ Each technology has a different lower limit of detection, i.e. viral load is "left censored" at a different value for different measurements. 
  
- The exposure of interest is the cumulative time on HAART, a continuous value. 
  + Because patients are only measured every 6 months, we don't know exactly how much time a patient has spent on HAART. 
  + Authors will approximate cumulative exposure as time since HAART initiation. 
  + Will deal with having a continuous outcome by creating categories supported by exploratory analysis.

---

## Censoring in Cole et al (2007)

- The authors are interested in comparing HAART with no antiretroviral therapy but there are non-HAART forms of ART that some participants used.

- Additionally, some people dropped out of the study or died.

- Cole et al treat death, dropping out, and initiation of non-HAART therapy all as censoring.

- This means that the target parameter is `$E[Y_{i,j}(Cum_{i,j}= c, C_{ij} = 0)]$`, i.e. the counterfactual expected viral load at time `$j$` if everyone had had a cumulative HAART exposure of `$c$` at time `$j$` and nobody had been censored.

---

## Exercise

- Consider a generic version of the Cole et al study with three time points (0, 1, and 2) and no censoring.

- There are three treatment variables `$A_0$`, `$A_1$`, `$A_2$`.

- Confounders are measured at each time `$L_0$`, `$L_1$`, and `$L_2$`. These could be affected by past treatment.

- An outcome is measured at each time `$Y_0$`, `$Y_1$`, `$Y_2$`. The outcome could be affected by past treatment and past confounders.

- Confounders `$L_k$` could be affecte by treatment `$A_{k-1}$`.

- Draw a DAG for this scenario.

---
## Exercise

</center>

- Here is a simplified solution leaving out arrows from `$L_0$` to `$Y_1$` and `$Y_2$` and from `$L_1$` to `$Y_2$`.

---
## Analyzing the Repeated Outcomes Data

- If our goal is to estimate `$E[Y_k(\bar{A}_k)]$`, one thing we could do is treat each outcome separately.

- For each `$k$`, consider a study that ends at time `$k$` in which we are only interested in the outcome `$Y_k$`.

- All previous outcomes are treated simply as confounders.

- We can use our previous machinery (e.g. IP weighting, parametric G-formula, or structural nested models) to estiamte `$E[Y_k(\bar{A}_k)]$`