L11: Time Varying Treatment Part 2

class: center, middle, inverse, title-slide

# L11: Time Varying Treatment Part 2
### Jean Morrison
### University of Michigan
### 2023-02-22 (updated: 2024-03-13)

---

`$\newcommand{\ci}{\perp\!\!\!\perp}$`

## Lecture Outline

1. G-Estimation for Time-Varying Treatments
1. Iterated Conditional Expectation Estimators
1. Repeated Outcome Measures
---
## G-Estimation For Time-Varying Exposures

---
## G-Estimation

- Recall from L8, in G-estimation for a single time-point, we start by writing a semiparametric structural mean model for the contrast of interest,

`$$E[Y(a) \vert A = a, L=l]-E[Y(0)\vert A = a, L=l] = \gamma(a, l; \beta)$$`
- We then define the `$H$` function, `$H_i(x) = Y_i - \gamma(A_i, L_i; x)$`.

- Finally, we find the value of `$x$` that makes `$H(x)$` orthogonal to `$A \vert L$` by either solving an estimating equation doing a grid search.

---
## G-Estimation

- Starting with the two time-point model, we will use the same strategy as in L8 but we will write two semiparametric structural mean models.

$$
`\begin{split}
&E[Y(a_0, 0) - Y(0, 0) &\vert A_0 = a_0, L_0= l_0] = \gamma_0(a_0, l_0; \beta_0) = \beta_0 a_0\\
&E[Y(a_0, a_1) - Y(a_0, 0) &\vert L_1(a_0) = l_1, L_0 = l_0, A_0 = a_0, A_1(a_0) = a_1] \\
&&= \gamma_1(a_0, a_1, l_1, l_0; \beta_1) \\
\end{split}`
$$

- For example, if we had no `$L_0$`, we could use

$$
`\begin{split}
\gamma_0(a_0; \beta_0) = \beta_0 a_0\\
\gamma_1(a_0, a_1, l_1, l_0; \beta_1) = a_1(\beta_{1,1} + \beta_{1,2}l_1 + \beta_{1,3}a_0 + \beta_{1,4}a_0l_1)
\end{split}`
$$
- By consistency, we can re-write the second equation as

$$
`\begin{split}
&E[Y(a_0, a_1) - Y(a_0, 0) &\vert L_1 = l_1, L_0 = l_0, A_0 = a_0, A_1 = a_1] \\
&&= \gamma_1(a_0, a_1, l_1, l_0; \beta_1) \\
\end{split}`
$$
---
## G-Estimation

- From `$Y(a_1, a_0) \ci A_0 \vert L_0$` we have

$$
E[Y(a_0, 0) - Y(0, 0) \vert A_0 = a_0, L_0= l_0] = E[Y(a_0, 0) - Y(0, 0) \vert L_0= l_0]
$$

- From `$Y(a_0, a_1) \ci A_1 \vert A_0 = a_0, L_0, L_1$`, we have
$$
`\begin{split}
&E[Y(a_0, a_1) - Y(a_0, 0) &\vert L_1 = l_1, L_0= l_0,  A_0 = a_0, A_1 = a_1] =\\
&& E[Y(a_0, a_1) - Y(a_0, 0) \vert L_1 = l_1, L_0= l_0, A_0 = a_0]
\end{split}`
$$
---
## G-Estimation

- Combining these results with our previous model we now have

$$
`\begin{split}
&E[Y(a_0, 0) - Y(0, 0) &\vert L_0= l_0] = \gamma_0(a_0, l_0; \beta_0) \\
&E[Y(a_0, a_1) - Y(a_0, 0) &\vert L_1 = l_1, L_0= l_0, A_0 = a_0, ] \\
&&= \gamma_1(a_0, a_1, l_1, l_0; \beta_1)
\end{split}`
$$
- Now we need to link this model to the data. We will use consistency!

- By consistency, `$Y_i(a_0, a_1) = Y_i$` if `$A_{0,i} = a_0$` and `$A_{1,i} = a_1$` so

$$
`\begin{split}
&E[Y_i(A_{0,i}, 0) \vert L_{1,i}, L_{0,i}, A_{0, i}= a_0, A_{1,i}] = Y_i - \gamma_1(\bar{A}_{i,1}, \bar{L}_{1,i}; \beta_1)\\
&E[Y_i(0, 0) \vert L_{0,i}, A_{0, i}] = E[Y_i(A_{0,i}, 0) \vert L_{0,i}, A_{0, i}] - \gamma_0(A_{0,i}, L_{0,i}; \beta_0)\\
\end{split}`
$$
- Note the top statement applies only to units with `$A_{0,i} = a_0$`.

- These are structural **nested** mean models because the left side of the top equation appears in the bottom equation.

---
## G-Estimation

- We can now define two "H" functions

$$
`\begin{split}
H_{1,i}(x) = Y_i - \gamma_1(\bar{A}_{1,i}, \bar{L}_{1,i}; x)\\
H_{0,i}(x) = H_{1,i}(x) - \gamma_0(A_{0,i}, L_{0,i}; x)
\end{split}`
$$
- Here `$x$` is a combined vector of parameters in both equations.

- Just as before, our goal is to find a value for `$x$` that makes mean sequential exchangeability true.

---
## G-Estimation

- We want to find the value of `$x$` such that `$H_{k}(x)$` and `$A_k$` are mean independent conditional on `$\bar{A}_{k-1}$` and `$\bar{L}_k$`.

- This means that we need models for `$E[A_k \vert \bar{A}_{k-1}, \bar{L}_k]$` for each time point.

- We have two options:

- Fit a model for `$E[A_{k} \vert \bar{A}_{k-1}, \bar{L}_k, H_k(x)]$` including `$H_k$` as one or more linear covariates and find a value of `$x$` such that all coefficients for `$H_k(x)$` are 0.

- Solve estimating equations.

---
## Considerations

- Remember, in G-Estimation we are working with structural nested mean models **not** structural marginal models, these models are conditional on covariates.

- This means we need to get the effect modification model right.

- In the time-varying case, we also need to think about effect modification by past treatment.

- We could quickly acquire many coefficients, so using the grid search method will become infeasible, we will have to use the estimating equations.

---
## Estimation

- Once we have estimated all the parameters of the structural nested mean models, we need to estimate `$E[Y(\bar{a})]$`.

- `$E[H_0(x^\star)] = E[Y(\bar{0})]$` at the true value of `$x^\star$`, so we can estimate `$E[Y(\bar{0})]$` as the sample average of `$H_{0, i}(\hat{x})$`.

- For other values, we can plug into our nested models.

- However, if the `$\gamma$` functions include covariates, we will need to simulate these covariates the same way we did for the parametric G-formula method.

---
## Estimation

- Suppose we want to estimat `$E[Y(a_0, a_1)]$`.

- We know that we can estimate `$E[Y(0, 0)]$` as `$\frac{1}{N}\sum_{i = 1}^N H_{0, i}(\hat{x})$`.

- From our first equation, we know that

`$$E[Y(a_0, 0) \vert L_0 = l_0] = E[Y(0, 0) \vert L_0 = l_0] + \gamma_0(a_0, l_0; \beta)$$`
so we can find `$\hat{E}[Y(a_0, 0)]$` as

$$
`\begin{split}
\hat{E}[Y(a_{0}, 0) ] &= \hat{E}_{L_0}[Y(a_0, 0) \vert L_{0} = l_0] \\
&= \hat{E}_{L_0}[Y(0, 0)] + E_{L_0}[\gamma_0(a_0, l_0; \hat{x})]\\
&= \frac{1}{N}\sum_{i = 1}^N H_{0, i}(\hat{x}) + \frac{1}{N}\sum_{i = 1}^N \gamma_0(a_0, l_{0, i}; \hat{x})
\end{split}`
$$
---
## Estimation

- From the second equation we know that

$$
`\begin{split}
E[Y(a_0, a_1) &\vert L_1 = l_1, L_0 = l_0, A_0 = a_0] =\\
 &E[Y(a_0, 0) \vert L_0 = l_0, L_1 = l_1, A_0 = a_0] + \gamma_1(a_0, a_1, l_0, l_1; \hat{x})
\end{split}`
$$

- So we can estimate `$E[Y(a_0, a_1)]$` as

$$
`\begin{split}
\hat{E}[Y(a_{0}, a_1) ] &= \hat{E}_{\bar{L}_1}[Y(a_0, a_1) \vert \bar{L}_1, A_{0}= a_0] \\
&= \hat{E}[Y(a_0, 0)] + \frac{1}{N}\sum_{i = 1}^N \gamma_1(\bar{a}_1, l_{0, i}, l_{1,i}(a_0); \hat{x})
\end{split}`
$$

- In the last step, we need to average over the counterfactual distribution of `$L_1$`, so we need to simulate from an estimated distribution of `$L_1 \vert L_0, A_0$`.

---
## General Estimation Procedure

- Suppose we want to estimate `$E[Y(\bar{a})]$`.

- Step 1: Estimate `$E[Y(\bar{0})]$` as the sample average of `$H_{0, i}(\hat{x})$`.

- Step 2: Fit parametric models for `$f(l_k \vert \bar{a}_{k-1}, \bar{l}_{k-1})$`.

- Step 3: Simulate a covariate history for everyone under the target treatment history (just like G-computation).

- Step 4: Compute `$\hat{E}[Y(\bar{a})] = \hat{E}[Y(\bar{0})] + \sum_{k = 0}^K \frac{1}{N}\sum_{i = 1}^{N}\gamma_k(\bar{a}_k, \bar{l}_{i,k}(\bar{a}_{k-1}; \hat{x}))$`

- Just as in G-computation, we don't have to limit our simulated data size to the orignal data size. We can resample many times.

---
## Iterated Conditional Expectation Estimators

---
## Iterated Conditional Expectation (ICE)

- Recall that we can express the G-formula is as a nested series of expectations.

- For two time-points the G-formula is

`$$E[Y(\bar{a})] = \int_{l_0, l_1} E[Y \vert \bar{A}_1 = \bar{a}, L_0 = l_0, L_1 = l_1] dF_1(l_1 \vert A_0 = a_0, L_0 = l_0) dF_0(l_0)$$`
- Previously, we showed that we could derive the G-formula from sequential exchangeability as an expression of iterated expectations.

$$
`\begin{split}
&Y(a_0, a_1) \ci A_1 \vert A_0 = a_0, L_0, L_1\\
&Y(a_0, a_1) \ci A_0 \vert L_0
\end{split}`
$$
---

## Iterated Conditional Expectation (ICE)

$$
`\begin{split}
&Y(a_0, a_1) \ci A_1 \vert A_0 = a_0, L_0, L_1\\
&Y(a_0, a_1) \ci A_0 \vert L_0
\end{split}`
$$
- From the second relation, we determine that
$$E[Y(\bar{a})] = \color{purple}{E_{L_0}\bigg[ } E[Y(a_1) \vert A_0 = a_0, L_0] \color{purple}{\bigg]} $$
- From the first relation, we have

`$$E[Y(a_1) \vert A_0 = a_0, L_0]  = \color{blue}{E_{L_1 \vert A_0 = a_0, L_0}\Big[} \color{green}{E\big[\color{red}{Y} \vert \bar{A}_1 = \bar{a}, L_0, L_1\big] } \color{blue}{\Big \vert A_0 = a_0, L_0\Big]}$$`
- Putting these parts together,
`$$E[Y(\bar{a})] = \color{purple}{E_{L_0}\bigg[ }\color{blue}{E_{L_1 \vert A_0 = a_0, L_0}\Big[} \color{green}{E\big[\color{red}{Y} \vert \bar{A}_1 = \bar{a}, L_0, L_1\big] } \color{blue}{\Big \vert A_0 = a_0, L_0\Big]} \color{purple}{\bigg]}$$`

---
## ICE Estimation

- The idea of ICE estimation is to fit models for each expectation iteratively.

- The first step is to define `$\hat{Q}_3 = Y$` and plug this into the G-formula expression.

`$$E[Y(\bar{a})] = \color{purple}{E_{L_0}\bigg[ }\color{blue}{E_{L_1 \vert A_0 = a_0, L_0}\Big[} \color{green}{E\big[\color{red}{\hat{Q}_3} \vert \bar{A}_1 = \bar{a}, L_0, L_1\big] } \color{blue}{\Big \vert A_0 = a_0, L_0\Big]} \color{purple}{\bigg]}$$`

- Next, define a model for `$\hat{Q}_2(L_1, L_0) = \hat{E}[\hat{Q}_3 \vert \bar{A}_1 = \bar{a}, \bar{L}_1]$`. For example,

`$$E[\hat{Q}_3 \vert \bar{L}_{1}, \bar{A}_{1} = \bar{a}] = \theta_{0, 1} +  \theta_{1,1} L_{1} + \theta_{2,1}L_0$$`
- We can fit this as a regular regression fit only among those with `$\bar{A}_1 = \bar{a}$`.

- Think of `$\hat{Q}_2$` as an estimate of `$E[Y(a_1) \vert A_0 = a_0, L_0, L_1]$`.

---
## ICE Estimation

- Now we plug `$\hat{Q}_2(L_1, L_0)$` into the expression

`$$E[Y(a_0, a_1)] = \color{purple}{E_{L_0}\bigg[ }\color{blue}{E_{L_1 \vert A_0 = a_0, L_0}\Big[} \color{green}{\hat{Q}_2(L_0, L_1)} \color{blue}{\Big \vert A_0 = a_0, L_0\Big]} \color{purple}{\bigg]}$$`

- Now we estimate the blue part. This will be a regression model with `$\hat{Q}_2$` as the outcome and `$L_0$` as predictors fit amongst those with `$A_0 = a_0$`.

- To find the outcome, at this stage, we compute `$\hat{Q}_{2,i}$` for each person with `$A_{0,i} = a_0$` by plugging `$L_{1,i}$`, `$L_{0,i}$` into the fitted model from the previous stage.

---
## ICE Estimation

- Use these fitted values to fit a model for `$\hat{Q}_1(L_0) = E[\hat{Q}_{2} \vert A_0 = a_0, L_0]$`, for example

`$$\hat{Q}_1(L_0) = E[\hat{Q}_2 \vert L_{0}, A_{0} = a_0] = \theta_{0, 0} + \theta_{1,0} L_{0}$$`
- Fit this model using only those with `$A_0 = a_0$`.

- Plugging this into the formula, we have 
`$$E[Y(a_0, a_1)] = \color{purple}{E_{L_0}\bigg[ }\color{blue}{\hat{Q}_1(L_0)} \color{purple}{\bigg]}$$`

- Now we compute `$\hat{Q}_{1,i}$` for each person in the study by plugging `$L_{0,i}$` into the fitted model.

- Finally we estimate `$\hat{E}[Y(\bar{a})] = \frac{1}{N}\sum_{i =1}^N\hat{Q}_{1,i}$` using everyone in the sample.

---
## ICE Estimation General Procedure

- Define

`$$Q_{K+2} = Y\\\
Q_{K+1} = E[Y \vert  \bar{A}_K = \bar{a}_K,  \bar{L}_{K}]\\\
\vdots\\\
Q_m = E[Q_{m+1} \vert \bar{A}_m = \bar{a}_m, \bar{L}_{m-1} ]\\\
\vdots\\\
Q_0 = E[Q_1 \vert A_0 = a_0, L_0]$$`

- Propose models for `$E[Q_{m+1} \vert \bar{L}_{m-1}, \bar{A}_{m} = \bar{a}_m]$`.

- At each stage, fit the next model using fitted values from the previous model as the outcome. Fit each model only among those with `$\bar{A}_m = \bar{a}_m$`.

- When we get all the way down to 0, `$Q_0 = E[Q_1 \vert A_0 = a_0, L_0]$` is an estimator of `$Y(\bar{a})$`.

---
## ICE Estimation

- The ICE estimator will be valid if the outcome regression was correct at each time point.

- An advantage over the parametric G-formula approach is that we didn't have to estimate the joint distribution of `$L$`.

---
## Doubly Robust Estimator

- We can turn the ICE estimator into a doubly robust estimator by adding a special covariate based on the propensity score to each regression.

- Below is a variation of the Scharfstein et al "special covariate" regression strategy for estimating `$Y(a)$` that we saw previously:

1. Estimate `$P[A = a \vert L]$` and compute `$W^{A}_i  = \frac{1}{\hat{P}[A = A_i \vert L]}$`

2. Propose an outcome model `$E[Y \vert A = a, L] = m(A = a, L; \theta)$`

3. Fit the model `$E[Y_i \vert A_i, L_i] = m(A_i, L_i; \theta) + \phi W^{A}_i$`, only among those with `$A_i = a$`.

4. Compute the standardized estimate

`$$E[Y(a)] = \frac{1}{N}\sum_{i=1}^N \left(m(a, L_i; \hat{\theta}) + \hat{\phi}\frac{1}{\hat{P}[A = a \vert L_i]} \right)$$`
---
## Doubly Robust Estimator for Time Varying Treatment

- We will use the same strategy in combination with the ICE estimation strategy.

- Just as in IP weighting, we need to fit a model for `$f(A_k \vert \bar{A}_{k-1}, \bar{L}_k)$` at each time point.

- For each time point compute

`$$W^{\bar{A}_m}_i = \prod_{k=0}^m \frac{1}{\hat{f}(A_k = A_{k,i}\vert \bar{A}_{k-1,i}, \bar{L}_{k,i})}$$`
and

`$$W^{\bar{A}_m, a_m }_i =  \frac{W^{\bar{A}_{m-1}}_i}{f(A_m = a_m \vert \bar{A}_{i,m-1}, \bar{L}_{i,m})}$$`
---
## Doubly Robust Estimator for Time Varying Treatment

- In each regression step, we add `$\hat{W}^{\bar{A}_m}$` as a covariate in the regression.

- In the estimation step, we replace `$A_m$` with `$a_m$` *and* `$W^{\bar{A}_m}$` with `$W^{\bar{A}_m, a_m = 1}$`.

---
## `$K+1$` Robustness

Our estimator is valid if:

1) The `$f(A_k \vert \bar{A}_{k-1}, \bar{L}_k)$` is correct at all times

2) The outcome model is correct at all time points.

3) The treatment model is correct for times 0 to `$k$` and the outcome model is correct for times `$k+1$` to `$K$` for any `$k < K$`

---

## Other Multiply Robust Estimators

- Just as in the point treatment case, there are other multiply robust estimators for time-varying treatments.

- For example, there is a version of the AIPW estimator `$\hat{\Delta}_{DR}$` for multiple time-points. 
  + There is also an extension of this estimator that gives `$2^{K+1}$` robustness, meaning that only one of the treatment or outcome model must be correct at each time point. 
  + See Technical Point 21.4 in HR for more details.

---
## Implementing Time-Varying Methods

---
## Simulated Data

- I simulated data for three time-points

- Treatment is binary, there is one continuous time-varying covariate, `$L$`, and one time-fixed covariate `$Z$`.

- One end of study continuous outcome, `$Y$`

- `$A_k$` depends only on `$L_k$` and `$A_{k-1}$`

- `$L_k$` depends on `$A_{k-1}$`, `$L_{k-1}$`, and `$Z$`

- `$Y$` is normal, mean is linear in `$A_k$`, `$L_k$`, and `$A_k L_k$`

- We will focus on trying to estimate `$E[Y(0, 0, 0)]$`, `$E[Y(0, 1, 1)]$` and `$E[Y(1, 1, 1)]$`

---
## Data Summary

<table>
 <thead>
  <tr>
   <th style="text-align:right;"> A0 </th>
   <th style="text-align:right;"> A1 </th>
   <th style="text-align:right;"> A2 </th>
   <th style="text-align:right;"> MeanY </th>
   <th style="text-align:right;"> N </th>
   <th style="text-align:right;"> MeanCounterfactualY </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> -0.06 </td>
   <td style="text-align:right;"> 624 </td>
   <td style="text-align:right;"> -0.23 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0.10 </td>
   <td style="text-align:right;"> 718 </td>
   <td style="text-align:right;"> 0.08 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> -0.41 </td>
   <td style="text-align:right;"> 1079 </td>
   <td style="text-align:right;"> -0.10 </td>
  </tr>
</tbody>
</table>

---
## IP Weighting

- For IP weighting we need a propensity model for each time point.

```r
## Fitting separate models
f0 <- glm(A.0 ~ L.0, family = "binomial", data = dat3)
f1 <- glm(A.1 ~ A.0 + L.1, family = "binomial", data = dat3)
f2 <- glm(A.2 ~ A.1 + L.2, family = "binomial", data = dat3)

## For numerator of stabilized weights
s0 <- glm(A.0 ~ 1, family = "binomial", data = dat3)
s1 <-  glm(A.1 ~ A.0 , family = "binomial", data = dat3)
s2 <- glm(A.2 ~ A.1, family = "binomial", data = dat3)
```

- We could also have estimated these in a combined model in long format data. 
  + We would need a column for `$A_{k-1}$` and we might want an interaction with time.
  
---
## IP Weighting

- Next we compute the weights:

```r
dat3$pA.0 <- predict(f0, type = "response")
dat3$pA.1 <- predict(f1, type = "response")
dat3$pA.2 <- predict(f2, type = "response")
dat3$sA.0 <- predict(s0, type = "response")
dat3$sA.1 <- predict(s1, type = "response")
dat3$sA.2 <- predict(s2, type = "response")

dat3 <- dat3 %>% mutate(W.0 = case_when(A.0 == 1 ~ sA.0/pA.0, 
                               TRUE ~ (1-sA.0)/(1-pA.0)),
                       W.1 = case_when(A.1 == 1 ~ sA.1/pA.1, 
                                       TRUE ~ (1-sA.1)/(1-pA.1)), 
                       W.2 = case_when(A.2 == 1 ~ sA.2/pA.2, 
                                       TRUE ~ (1-sA.2)/(1-pA.2)), 
                       W = W.0*W.1*W.2, 
                       A = paste0(A.0, "-", A.1, "-", A.2))
```

---
## IP Weighting

- Finally, we fit a marginal structural model and retrieve the fitted values for the treatment courses of interest.

```r
library(sandwich)
## data for extracting fitted values
newdat <- data.frame(A.0 = c(0, 0, 1), 
                     A.1 = c(0,1,  1), A.2 = c(0,1, 1))
get_est <- function(msm){
  car::Predict(msm, newdata = newdat, 
      se.fit = TRUE, vcov. = vcovHC(msm, type = "HC0"))
}
## fit saturated marginal structural model
f_msm1 <- lm(Y ~ A.0*A.1*A.2, weights = W,  data = dat3)
ipw_est1 <- get_est(f_msm1)

## fit parametric but correct marginal structural model
f_msm2 <- lm(Y ~ A.0 + A.1 + A.2,  weights = W, data = dat3)
ipw_est2 <- get_est(f_msm2)

## fit parametric and incorrect marginal structural model
f_msm3 <- lm(Y ~ I(A.0 + A.1 + A.2),  weights = W, data = dat3)
ipw_est3 <- get_est(f_msm3)
```

---
## IP Weighting

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> A </th>
   <th style="text-align:right;"> Truth </th>
   <th style="text-align:left;"> IPWEst1 </th>
   <th style="text-align:left;"> IPWEst2 </th>
   <th style="text-align:left;"> IPWEst3 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> 0-0-0 </td>
   <td style="text-align:right;"> -0.23 </td>
   <td style="text-align:left;"> -0.23 (-0.28, -0.19) </td>
   <td style="text-align:left;"> -0.18 (-0.22, -0.13) </td>
   <td style="text-align:left;"> -0.16 (-0.2, -0.11) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0-1-1 </td>
   <td style="text-align:right;"> 0.08 </td>
   <td style="text-align:left;"> 0.08 (0.02, 0.14) </td>
   <td style="text-align:left;"> 0.08 (0.03, 0.12) </td>
   <td style="text-align:left;"> -0.09 (-0.12, -0.06) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 1-1-1 </td>
   <td style="text-align:right;"> -0.10 </td>
   <td style="text-align:left;"> -0.12 (-0.18, -0.05) </td>
   <td style="text-align:left;"> -0.06 (-0.11, -0.02) </td>
   <td style="text-align:left;"> -0.06 (-0.1, -0.01) </td>
  </tr>
</tbody>
</table>

---
## Parametric G-Formula

- I used the `gfoRmula` R package. First we specify arguments.

```r
library(gfoRmula)

id <- 'id'
time_points <- 3
time_name <- 'time'
covnames <- c('L', 'A') #time-varying
covtypes <- c('normal', 'binary')
outcome_name <- 'Y'
outcome_type <- "continuous_eof"
histories <- c(lagged) #can also do lagged average
histvars <- list(c('L', 'A')) 
covparams <- list(covmodels = c(L ~ lag1_A + lag1_L,
                                A ~ lag1_A + L))
ymodel <- Y ~ A + lag1_A + lag2_A + L + lag1_L + 
    lag2_L + A*L + lag1_A*lag1_L + lag2_A*lag2_L + Z
intvars <- list('A', 'A', 'A')
```

---
## Parametric G-Formula

- The gformula will perform boostrapping for us. This can take a bit of time.

```r
interventions <- list(list(c(static, c(0, 0, 0))),
                      list(c(static, c(0, 1, 1))), 
                      list(c(static, c(1, 1, 1))))
int_descript <- c('Never treat', 'Treat last 2', 'Always treat')
nsimul <- 10000 # number of data points to simulate
gform_result <- gformula(obs_data = dat3_long, id = id,
                        time_points = time_points,
                        time_name = time_name, covnames = covnames,
                        outcome_name = outcome_name,
                        outcome_type = outcome_type, covtypes = covtypes,
                        covparams = covparams, ymodel = ymodel,
                        intvars = intvars,
                        interventions = interventions,
                        int_descript = int_descript,
                        histories = histories, histvars = histvars,
                        basecovs = c('Z'), nsimul = nsimul,
                        seed = 1234, #, 
                        nsamples= 500, #bootstrap samples
                        ci_method = "normal")
```

---
## Parametric G-Formula

```r
gform_result
```

```
## PREDICTED RISK UNDER MULTIPLE INTERVENTIONS
## 
## Intervention 	 Description
## 0 		 Natural course
## 1		 Never treat
## 2		 Treat last 2
## 3		 Always treat
## 
## Sample size = 5000, Monte Carlo sample size = 10000
## Number of bootstrap samples = 500
## Reference intervention = natural course (0)
##  
## 
##  k Interv.    NP mean g-form mean    Mean SE Mean lower 95% CI
##  2       0 -0.1395253 -0.14256824 0.01265358       -0.16736880
##  2       1         NA -0.23747779 0.01190050       -0.26080235
##  2       2         NA  0.08631253 0.01425984        0.05836376
##  2       3         NA -0.11377756 0.01673769       -0.14658282
##  Mean upper 95% CI Mean ratio      MR SE MR lower 95% CI MR upper 95% CI
##        -0.11776769  1.0000000 0.00000000      -0.1425682     -0.14256824
##        -0.21415324  1.6657131 0.11596470      -0.4647644     -0.01019116
##         0.11426129 -0.6054120 0.15205057      -0.2117011      0.38432618
##        -0.08097229  0.7980568 0.06517407      -0.2415164      0.01396128
##  Mean difference       MD SE MD lower 95% CI MD upper 95% CI % Intervened On
##       0.00000000 0.000000000     -0.14256824     -0.14256824            0.00
##      -0.09490955 0.009954584     -0.25698842     -0.21796717           87.44
##       0.22888077 0.008794431      0.06907576      0.10354929           85.49
##       0.02879069 0.008262831     -0.12997241     -0.09758271           78.54
##  Aver % Intervened On
##               0.00000
##              50.95333
##              46.52667
##              40.52333
```

---
## G-Formula

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> A </th>
   <th style="text-align:right;"> Truth </th>
   <th style="text-align:left;"> IPWEst1 </th>
   <th style="text-align:left;"> GForm </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> 0-0-0 </td>
   <td style="text-align:right;"> -0.23 </td>
   <td style="text-align:left;"> -0.23 (-0.28, -0.19) </td>
   <td style="text-align:left;"> -0.24 (-0.26, -0.21) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0-1-1 </td>
   <td style="text-align:right;"> 0.08 </td>
   <td style="text-align:left;"> 0.08 (0.02, 0.14) </td>
   <td style="text-align:left;"> 0.09 (0.06, 0.11) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 1-1-1 </td>
   <td style="text-align:right;"> -0.10 </td>
   <td style="text-align:left;"> -0.12 (-0.18, -0.05) </td>
   <td style="text-align:left;"> -0.11 (-0.15, -0.08) </td>
  </tr>
</tbody>
</table>

---
## G-Estimation

- I used the `gesttools` package. This package has some limitations.

- I believe it can only have one effect modification variable. It may not be able to handle effect modification by past treatments.

- It does not calculate `$E[Y(a)]$` for you. Recall - this step requires simulating covariates.

---
## G-Estimation

```r
library(gesttools)
# time must start from 1
dat3_long$time <- dat3_long$time+ 1

dat3_gest <- FormatData(
  data = data.frame(dat3_long),
  idvar = "id", timevar = "time", An = "A",
  varying = c("A", "L"), 
  GenerateHistory = TRUE, GenerateHistoryMax = 2
)
idvar <- "id"
timevar <- "time"
Yn <- "Y"
An <- "A"
Cn <- NA
# generally a good idea to add time, our model doesn't require it
propensitymodel <- c("A~L+Lag1A") 
```

---
## G-Estimation

- Outcome models specifies the model for `$E[Y(\bar{a}_{1:i}, 0) \vert \bar{l}_i]$`.

- `type = 4` allows a time-varying causal effect and effect modification.

```r
outcomemodels <- list("Y~A + L + A*L", 
                      "Y~A + L + A*L", 
                      "Y~A + L + A*L")
gest_result <- gestSingle(dat3_gest, idvar, timevar, Yn, An, 
                          Cn, outcomemodels, propensitymodel,
                          censoringmodel = NULL, 
                          type = 4, EfmVar= c("L"), cutoff = NA
)
gest_result$psi
```

```
##       t=1.A     t=1.A:L       t=2.A     t=2.A:L       t=3.A     t=3.A:L 
## -0.02420145 -0.18602402  0.13135301 -0.19205099  0.32934700 -0.29831340
```

---
## G-Estimation

- To calculate `$E[Y(\bar{a})]$`, we are on our own.

```r
## gamma functions
g0 <- function(A.0, L.0){
  gest_result$psi[1]*A.0 + gest_result$psi[2]*A.0*L.0
}
g1 <- function(A.1, L.1){
  gest_result$psi[3]*A.1 + gest_result$psi[4]*A.1*L.1
}
g2 <- function(A.2, L.2){
  gest_result$psi[5]*A.2 + gest_result$psi[6]*A.2*L.2
}

## Calculate H0 for everyone
dat3$H2 <- with(dat3, Y - g2(A.2, L.2))
dat3$H1 <- with(dat3, H2 - g1(A.1, L.1))
dat3$H0 <- with(dat3, H1 - g0(A.0, L.0))
mean(dat3$H0)
```

```
## [1] -0.2475632
```

---
## G-Estimation

- To estimate `$E[Y(0, 1, 1)]$` and `$[Y(1, 1, 1)]$`, we need to estimate the distribution of `$L_1$` and `$L_2$`.

```r
## now we need distribution models for L1 and L2
fl1 <- lm(L.1 ~ A.0 + L.0, data = dat3)
rl1 <- function(nsamp, A.0, L.0){
  c <- fl1$coefficients
  rnorm(n = nsamp, 
        mean = c[1] + c[2]*A.0 + c[3]*L.0, 
        sd = sd(fl1$residuals))
}
fl2 <- lm(L.2 ~ A.1 + L.1, data = dat3)
rl2 <- function(nsamp, A.1, L.1){
  c <- fl2$coefficients
  rnorm(n = nsamp, 
        mean = c[1] + c[2]*A.1 + c[3]*L.1, 
        sd = sd(fl2$residuals))
}
```

---
## G-Estimation

```r
gest_mean <- function(a0, a1, a2, nrep = 10){
 sim_dat <- purrr::map_dfr(1:nrep, function(i){
    simdat3 <- dat3
    n <- nrow(simdat3)
    simdat3$A.0 <- a0
    simdat3$L.1 <- with(simdat3, 
                        rl1(nsamp = n, A.0 = A.0, L.0 = L.0))
    simdat3$A.1 <- a1
    simdat3$L.2 <- with(simdat3, 
                        rl2(nsamp = n, A.1 = A.1, L.1 = L.1))
    simdat3$A.2 <- a2
    simdat3$Y <- with(simdat3, H0 + g0(A.0, L.0) + 
                                    g1(A.1, L.1) + 
                                    g2(A.2, L.2))
    return(simdat3)
  })
  return(mean(sim_dat$Y))
}
```
---
## G-Estimation

```r
gest_mean(0, 1,1, nrep = 10)
```

```
## [1] 0.1006678
```

```r
gest_mean(1, 1,1, nrep = 10)
```

```
## [1] -0.108126
```

---
## G-Estimation

- I bootstrapped the standard error (code hidden but you can find it here).

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> A </th>
   <th style="text-align:right;"> Truth </th>
   <th style="text-align:left;"> IPWEst1 </th>
   <th style="text-align:left;"> GForm </th>
   <th style="text-align:left;"> GEst </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> 0-0-0 </td>
   <td style="text-align:right;"> -0.23 </td>
   <td style="text-align:left;"> -0.23 (-0.28, -0.19) </td>
   <td style="text-align:left;"> -0.24 (-0.26, -0.21) </td>
   <td style="text-align:left;"> -0.25 (-0.27, -0.22) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0-1-1 </td>
   <td style="text-align:right;"> 0.08 </td>
   <td style="text-align:left;"> 0.08 (0.02, 0.14) </td>
   <td style="text-align:left;"> 0.09 (0.06, 0.11) </td>
   <td style="text-align:left;"> 0.1 (0.07, 0.13) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 1-1-1 </td>
   <td style="text-align:right;"> -0.10 </td>
   <td style="text-align:left;"> -0.12 (-0.18, -0.05) </td>
   <td style="text-align:left;"> -0.11 (-0.15, -0.08) </td>
   <td style="text-align:left;"> -0.11 (-0.14, -0.08) </td>
  </tr>
</tbody>
</table>
---
## ICE Estimation

- This one I implemented by hand, but it was fairly straight-forward.

- There are two ways to do this.

- The **stratified** method is the method described in the book. 
  + At stage `$k$`, fit the model only amongst those with the desired tretment history up to time `$k$`.
  + This means we only need to get the Y-covariate relationship right. 
  + This will be non-parametric with respect to the Y-A relationship.
  
- The **non-stratified** method fully specifies the outcome model.
  + At each stage, we fit the model in the full data.
  + We have to get the outcome model correct.
  
---
## ICE Estimation

```r
fit_ice_stratified <- function(a0, a1, a2, dat){
  ## Recall Q4 = Y
  Q3mod <- glm(Y ~ Z + L.0 + L.1 + L.2, data = dat, 
             subset = which(dat$A.2 == a2 & 
                              dat$A.1 == a1 & dat$A.0 == a0))
  dat$Q3 <- predict(Q3mod, newdata = dat3, type = "response")
  
  Q2mod <- glm(Q3 ~ Z + L.0 + L.1, data = dat, 
               subset = which( dat$A.1 == a1 & dat$A.0 == a0) )
  dat$Q2 <- predict(Q2mod, newdata = dat, type = "response")
  
  Q1mod <- glm(Q2 ~ Z + L.0, data = dat, 
               subset = which( dat$A.0 == a0) )
  dat$Q1 <- predict(Q1mod, newdata = dat, type = "response")
  return(mean(dat$Q1))
}
```

---
## ICE Estimation

```r
fit_ice_nonstratified <- function(a0, a1, a2, dat){
  dat.copy <- dat
  dat.copy$A.0 <- a0
  dat.copy$A.1 <- a1
  dat.copy$A.2 <- a2
  Q3mod <- glm(Y ~ Z + A.0 + A.1 + A.2 +  
                 A.0*L.0 + A.1*L.1 + A.2*L.2,  data = dat)
  dat$Q3 <- predict(Q3mod, newdata = dat.copy, type = "response")
  
  Q2mod <- glm(Q3 ~ Z + A.0 + A.1 +  
                 A.0*L.0 + A.1*L.1 ,   data = dat)
  dat$Q2 <- predict(Q2mod, newdata = dat.copy, type = "response")
  
  Q1mod <- glm(Q2 ~ Z + A.0 + A.0*L.0,   data = dat)
  dat$Q1 <- predict(Q1mod, newdata = dat.copy, type = "response")
  return(mean(dat$Q1))
}
```
---
## ICE Estimation

```r
c(fit_ice_stratified(0, 0, 0, dat3), 
  fit_ice_stratified(0, 1, 1, dat3),
 fit_ice_stratified(1, 1, 1, dat3))
```

```
## [1] -0.23663412  0.07712922 -0.10809774
```

```r
c(fit_ice_nonstratified(0, 0, 0, dat3),
  fit_ice_nonstratified(0, 1, 1, dat3),
  fit_ice_nonstratified(1, 1, 1, dat3))
```

```
## [1] -0.23755063  0.07933465 -0.11213721
```

---
## All Results Combined

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> A </th>
   <th style="text-align:right;"> Truth </th>
   <th style="text-align:left;"> IPWEst1 </th>
   <th style="text-align:left;"> GForm </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> 0-0-0 </td>
   <td style="text-align:right;"> -0.23 </td>
   <td style="text-align:left;"> -0.23 (-0.28, -0.19) </td>
   <td style="text-align:left;"> -0.24 (-0.26, -0.21) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0-1-1 </td>
   <td style="text-align:right;"> 0.08 </td>
   <td style="text-align:left;"> 0.08 (0.02, 0.14) </td>
   <td style="text-align:left;"> 0.09 (0.06, 0.11) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 1-1-1 </td>
   <td style="text-align:right;"> -0.10 </td>
   <td style="text-align:left;"> -0.12 (-0.18, -0.05) </td>
   <td style="text-align:left;"> -0.11 (-0.15, -0.08) </td>
  </tr>
</tbody>
</table>

---
## Repeated Outcome Measurements

- Until now, we have considered:

1. Point interventions and single outcomes. 
1. Point interventions and survival outcomes. 
1. Time-varying interventions and single outcomes.

- Now we will consider the case when the outcome of interest is a value that varies over time. The treatment may also vary over time.

---

## Cole et al (2007)

- Cole et al are interested in the effect of highly active antiretroviral therapy (HAART), a treatment for HIV, on viral load.

- They are using two longitudinal AIDS and HIV study's, the Multicenter AIDS Cohort Study (all men) and the Women's Interacency HIV Study (all women).

- Participants of both studies complete questionnaires and have blood work taken  every 6 months.

- HAART therapy became available in April 1996, so the study population is 918 individuals who were alive, HIV positive, and not using antiretroviral therapy at this time.

- Baseline visit: First visit after April 1996 or first visit after April 1996 with complete data (whichever is later).

---

## Challenges in Cole et al (2007)

- The technology used to measure HIV viral load changed over time and differed between study cohorts.

+ Each technology has a different lower limit of detection, i.e. viral load is "left censored" at a different value for different measurements. 
  
- The exposure of interest is the cumulative time on HAART, a continuous value. 
  + Because patients are only measured every 6 months, we don't know exactly how much time a patient has spent on HAART. 
  + Authors will approximate cumulative exposure as time since HAART initiation. 
  + Will deal with having a continuous outcome by creating categories supported by exploratory analysis.

---

## Censoring in Cole et al (2007)

- The authors are interested in comparing HAART with no antiretroviral therapy but there are non-HAART forms of ART that some participants used.

- Additionally, some people dropped out of the study or died.

- Cole et al treat death, dropping out, and initiation of non-HAART therapy all as censoring.

- This means that the target parameter is `$E[Y_{i,j}(Cum_{i,j}= c, C_{ij} = 0)]$`, i.e. the counterfactual expected viral load at time `$j$` if everyone had had a cumulative HAART exposure of `$c$` at time `$j$` and nobody had been censored.

---

## Exercise

- Consider a generic version of the Cole et al study with three time points (0, 1, and 2) and no censoring.

- There are three treatment variables `$A_0$`, `$A_1$`, `$A_2$`.

- Confounders are measured at each time `$L_0$`, `$L_1$`, and `$L_2$`. These could be affected by past treatment.

- An outcome is measured at each time `$Y_0$`, `$Y_1$`, `$Y_2$`. The outcome could be affected by past treatment and past confounders.

- Confounders `$L_k$` could be affecte by treatment `$A_{k-1}$`.

- Draw a DAG for this scenario.

---
## Exercise

</center>

- Here is a simplified solution leaving out arrows from `$L_0$` to `$Y_1$` and `$Y_2$` and from `$L_1$` to `$Y_2$`.

---
## Analyzing the Repeated Outcomes Data

- If our goal is to estimate `$E[Y_k(\bar{A}_k)]$`, one thing we could do is treat each outcome separately.

- For each `$k$`, consider a study that ends at time `$k$` in which we are only interested in the outcome `$Y_k$`.

- All previous outcomes are treated simply as confounders.

- We can use our previous machinery (e.g. IP weighting, parametric G-formula, or structural nested models) to estiamte `$E[Y_k(\bar{A}_k)]$`