class: center, middle, inverse, title-slide .title[ # L13: Instrumental Variable Analysis Part 2 ] .author[ ### Jean Morrison ] .institute[ ### University of Michigan ] .date[ ### Lecture on 2023-03-20 (updated: 2023-03-21) ] --- `\(\newcommand{\ci}{\perp\!\!\!\perp}\)` ## Lecture Outline 1. Structural Equation Motivation for IVW 1. Distribution of Single IV Estimator 1. Multiple IV Estimators 1. Mendelian Randomization --- # 1. Structural Equation Motivation for IVW --- ## Structural Equation Model Approach - Consider the set of linear structural models: `$$A_i(z) = \beta_{A0} + \beta_{AZ} z + \epsilon_{A,i} \\\ Y_i(a) = \beta_{Y0} + \gamma a + \epsilon_{Y,i}$$` - `\(\gamma\)` is the causal effect we want to identify. - In this model there is both the effect of `\(Z\)` on `\(A\)` and the effect of `\(A\)` on `\(Y\)` are homogeneous (the same for everyone). - `\(\epsilon_{A,i}\)` and `\(\epsilon_{Y,i}\)` are mean zero deviations, which may depend on other variables. - If there is confounding between `\(A\)` and `\(Y\)`, then `\(\epsilon_A\)` and `\(\epsilon_Y\)` are correlated. - Using consistency, we can substitute `\(A_i\)` for `\(A_i(z)\)` and `\(Z_i\)` for `\(z\)` in both equations. --- ## IV Assumptions Impose Constraints `$$A_i = \beta_{A0} + \beta_{AZ} Z_i + \epsilon_{A,i} \\\ Y_i = \beta_{Y0} + \gamma A_i + \epsilon_{Y,i}$$` - The relevance assumption means that `\(\beta_{AZ} \neq 0\)`. - The exclusion restriction requires that `\(Y\)` is independent of `\(Z\)` given `\(A\)`. This is satisfied by conditions: - `\(Z_i\)` is not in the second equation and - `\(Cov(Z, \epsilon_{Y}) = 0\)` - Exchangeability requires that there is no confounding between `\(Z\)` and `\(Y\)`. - This is also satisfied by `\(Cov(Z, \epsilon_{Y}) = 0\)` - We need an additional condition to identify `\(\gamma\)`: `\(Cov(Z_i, \epsilon_{A, i}) = 0\)`. + This is equivalent to assuming that the true association between `\(Z\)` and `\(A\)` is linear. --- ## Structural Equation Model Approach Starting with our system of structural equations: `$$A_i = \beta_{A0} + \beta_{AZ} Z_i + \epsilon_{A,i} \\\ Y_i = \beta_{Y0} + \gamma A_i + \epsilon_{Y,i}$$` Plug the first equation into the second. `$$Y_i = \beta_{Y0} + \gamma \left(\beta_{A0} + \beta_{AZ} Z_i + \epsilon_{A,i} \right) + \epsilon_{Y,i}\\\ = \beta_{Y0}^\prime + \gamma \beta_{AZ} Z_i + \epsilon_{Y,i}^\prime$$` --- ## Structural Equation Model Approach `$$A_i = \beta_{A0} + \beta_{AZ} Z_i + \epsilon_{A,i} \\\ Y_i = \beta_{Y0}^\prime + \gamma \beta_{AZ} Z_i + \epsilon_{Y,i}^\prime$$` This result suggests two estimation strategies: -- 1. Two stage least squares: - Regress `\(Z\)` on `\(A\)` to obtain `\(\hat{\beta}_{AZ}\)`. - Regress `\(Y\)` on `\(\hat{\beta}_{AZ}Z\)` to estimate `\(\gamma\)`. -- 2. Ratio estimator: - Regress `\(A\)` on `\(Z\)` to obtain `\(\hat{\beta}_{AZ}\)`. - Regress `\(Y\)` on `\(Z\)` to obtain `\(\hat{\beta}_{YZ}\)`, an estimate of `\(\gamma\beta_{AZ}\)`. - Estimate `\(\gamma\)` by `\(\hat{\beta}_{YZ}/\hat{\beta}_{AZ}\)` - We will show that these estimates are identical, and for binary `\(Z\)` and `\(A\)`, equal to the version of `\(\beta_{IV}\)` we have already seen. --- ## Two Stage Least Squares - Suppose we have `\(N\)` observations of `\((Z, A, Y)\)`. - Let `\(\mathbf{Z}\)`, `\(\mathbf{A}\)`, and `\(\mathbf{Y}\)` be `\(N\times 1\)` vectors. - For simplicity, assume that `\(\mathbf{A}\)` and `\(\mathbf{Y}\)` are centered (mean 0). - Then in the first stage we obtain `$$\hat{\beta}_{AZ} = (\mathbf{Z}^\top \mathbf{Z})^{-1}\mathbf{Z}^\top\mathbf{A}$$` - In the second stage we regress `\(\hat{\beta}_{AZ}\mathbf{Z}\)` on `\(Y\)`. `$$\hat{\gamma} = \hat{\beta}_{2SLS} = (\hat{\beta}_{AZ}^2\mathbf{Z}^\top \mathbf{Z})^{-1}\hat{\beta}_{AZ}\mathbf{Z}^\top\mathbf{Y}\\\ \frac{(\mathbf{Z}^\top \mathbf{Z})^{-1}\mathbf{Z}^\top\mathbf{Y}}{\hat{\beta}_{AZ}} = \frac{\hat{\beta}_{YZ}}{\hat{\beta}_{AZ}}$$` --- ## Two Stage Least Squares - We can see easily that the 2SLS estimator is equal to the ratio estimator. - If `\(Z\)` is binary then the OLS estimate of `\(\hat{\beta}_{AZ}\)` is `\(E[A \vert Z =1]-E[A \vert Z = 0]\)` and `\(\hat{\beta}_{YZ}\)` is `\(E[Y \vert Z = 1] - E[Y \vert Z = 0]\)`. - So `\(\hat{\beta}_{IV}\)` introduced previous is a special case of the ratio estimator. --- ## Two Sample IV - Both the 2SLS framework and the ratio estimator framework suggest that we don't need to measure `\(Z\)`, `\(A\)` and `\(Y\)` in the same sample. - We could instead have two samples, + Sample 1: data for `\(Z\)` and `\(A\)` and + Sample 2: data for `\(Z\)` and `\(Y\)`. - In 2SLS, we conduct the first stage regression of `\(A\)` on `\(Z\)` in Sample 1. - We then use `\(\hat{\beta}_{AZ}\)` to compute `\(\hat{\beta}_{AZ}Z\)` in Sample 2. We then regress `\(Y\)` on this new variable. + `\(\hat{\beta}_{AZ}Z\)` is like an imputed unconfounded version of `\(A\)`. - Using the ratio framework, we use Sample 1 to estimate `\(\hat{\beta}_{AZ}\)` and `\(\hat{\beta}_{YZ}\)` and then compute the ratio. - These two strategies give identical results. --- ## Two Sample IV - Being able to estimate the effect of `\(A\)` on `\(Y\)` without observing `\(A\)` and `\(Y\)` in the same data set is extremely powerful. - It makes it possible to address causal questions that would be otherwise impossible to study. Examples: - `\(A\)` and `\(Y\)` might occur very far apart in time making it impractical to measure them in the same units. + E.g Do exposures during pregnancy increase risks of late in life diseases. - One of `\(A\)` or `\(Y\)` might be challenging to measure while the other is easy. We can have a larger sample size for one stage than the other. - Data have already been collected for either `\(Z\)` and `\(A\)` or `\(Z\)` and `\(Y\)` in a sample that cannot be recontacted. --- ## Adjusting for `\(Z-Y\)` Confounding - Our instrument does not have to perfectly satisfy the exchangeability condition, as long as we have measured any variables confounding `\(Z\)` and `\(A\)`. - Hernan and Robins suggest that g-methods can be used to estimate the causal effect of `\(Z\)` on `\(A\)` accounting for confounders. - Probably the most common strategy is to simply add confounders to the regression of `\(A\)` on `\(Z\)`. - In the education example, age is a potential confounder between quarter of birth and wages. + Men born in the first quarter are older than their peers who started school the same year. + Age is also associated with earnings. + Angrist and Kruger adjust for age and age squared in the first stage of the 2SLS regression. --- # 2. Distribution of Single IV Estimator --- ## Distribution of the IV Estimator - The standard IV estimator for a single IV is a ratio of random variables. `$$\hat{\gamma} = \hat{\beta}_{IV} = \frac{\hat{\beta}_{YZ}}{\hat{\beta}_{AZ}}$$` - If `\(\hat{\beta}_{YZ}\)` and `\(\hat{\beta}_{AZ}\)` are estimated in different samples then the numerator and denominator are independent. Otherwise, they are dependent. - Inconveniently, none of the moments of `\(\hat{\beta}_{IV}\)` exist and it is not normally distributed. + This occurs because there is a finite chance that `\(\hat{\beta}_{AZ}\)` is close to zero. - However, for instruments with large effects on `\(A\)`, the portion of the distribution of `\(\hat{\beta}_{AZ}\)` close to zero is very small and a normal approximation to `\(\hat{\beta}_{IV}\)` works reasonably well. --- ## Distribution of the IV Estimator - Write `\(\hat{\beta}_{AZ} = \mu + \sigma_{AZ} T_A\)` where `\(T_A\)` has mean 0 and variance 1, and `\(\sigma^2_{AZ} = Var(\hat{\beta}_{AZ})\)`. - If all of our IV assumptions hold the `\(\hat{\beta}_{YZ} = \gamma \mu + \sigma_{YZ} T_y\)`, with `\(\sigma^2_{YZ} = Var(\hat{\beta}_{YZ})\)`. - This means that, `$$\hat{\beta}_{IV} = \frac{\gamma \mu + \sigma_{YZ} T_Y}{\mu + \sigma_{AZ} T_A} = \gamma \frac{1}{1 + \sigma_{AZ} T_A/\mu} + \frac{\sigma_{YZ} T_Y}{\mu + \sigma_{AZ} T_A}$$` --- ## Distribution of the IV Estimator `$$\hat{\beta}_{IV} = \frac{\gamma \mu + \sigma_{YZ} T_Y}{\mu + \sigma_{AZ} T_A} = \gamma \frac{1}{1 + \sigma_{AZ} T_A/\mu} + \frac{\sigma_{YZ} T_Y}{\mu + \sigma_{AZ} T_A}$$` - If `\(T_{YZ}\)` and `\(T_{AZ}\)` are independent, then the expectation of the last term is 0. This occurs in two sample IVA. - If `\(T_A\)` is normally distributed, then `\(E[1/(1 + \sigma_{AZ} T_A/\mu)]\)` does not have a defined expectation. - However, we can see that if `\(\mu/\sigma_{AZ}\)` is large, then `\(\frac{1}{1 + \sigma_{AZ} T_A/\mu} \approx 1\)`. - If we consider the behavior of `\(\hat{\beta}_{IV}\)` asymptotically as the sample size used to estimate `\(\hat{\beta}_{ZX}\)` increases, `\(\hat{\beta}_{IV} \to 1\)`. --- ## Distribution of the IV Estimator - We can get estimates of the mean and variance of `\(\hat{\beta}_{IV}\)` from a first order Taylor expansion around `\(\gamma\)`. - Recall that `\(\hat{\beta}_{IV}\)` does not actually have either first or second moments! - Define `\(f(\hat{\beta}_{YZ}, \hat{\beta}_{AZ}) = \hat{\beta}_{YZ}/\hat{\beta}_{AZ} = \hat{\beta}_{IV}\)`. Then for any two-dimensional point `\(\boldsymbol{\theta}\)`, `$$f(\hat{\beta}_{YZ}, \hat{\beta}_{AZ}) = f(\boldsymbol{\theta}) + \nabla f(\boldsymbol{\theta})^\top \begin{pmatrix} \hat{\beta}_{YZ} - \theta_1 \\ \hat{\beta}_{AZ} - \theta_2 \end{pmatrix} + \text{Rem}$$` --- ## Distribution of the IV Estimator - If we plug in `\(\boldsymbol{\theta} = (E[\hat{\beta}_{YZ}], E[\hat{\beta}_{AZ}]) = (\gamma \mu , \mu)\)` we obtain $$ `\begin{split} E[f(\hat{\beta}_{YZ}, \hat{\beta}_{AZ})] \approx & \gamma\\ Var(f(\hat{\beta}_{YZ}, \hat{\beta}_{AZ})) \approx & \gamma^2 \left(\frac{\sigma^2_{AZ}}{ \mu^2} + \frac{\sigma^2_{YZ}}{\gamma^2 \mu^2} - 2 \frac{Cov(\hat{\beta}_{YZ}, \hat{\beta}_{AZ})}{\gamma \mu ^2} \right) \\ \approx & \hat{\gamma}^2 \left(\frac{\hat{\sigma}^2_{AZ}}{ \hat{\beta}_{AZ}^2} + \frac{\hat{\sigma}^2_{YZ}}{\hat{\beta}_{YZ}^2} - 2 \frac{Cov(\hat{\beta}_{YZ}, \hat{\beta}_{AZ})}{\hat{\beta}_{AZ}\hat{\beta}_{YZ}} \right) \end{split}` $$ - In the last step we replace all unknowns with estimates. --- ## Distribution of the IV Estimator - To see how good the normal approximation is, we simulate some data. - Simulate `\(\hat{\beta}_{YZ} \sim N(\gamma \mu_x, 1)\)` and `\(\hat{\beta}_{AZ} \sim N(\mu_x, 1)\)`, independent. - Calculate `\((\hat{\beta}_{IV} - \gamma)/\sqrt{\hat{V}}\)` using the formula from the previous slide to estimate the variance of `\(\hat{\beta}_{IV}\)` - Compare these values to a standard normal distribution. - We consider values of `\(\gamma = 0\)` or `\(\gamma = 1\)` and `\(\mu = 1, 2, 5, 10\)`. --- ## Distribution of the IV Estimator <center> <img src="13_iva_part2_files/figure-html/unnamed-chunk-2-1.png" width="95%" style="display: block; margin: auto;" /> </center> --- ## Distribution of the IV Estimator - The approximation is better when `\(\mu\)` is larger. - When `\(\mu\)` is small and `\(\gamma \neq 0\)`, `\(\hat{\beta}_{IV}- \gamma\)` is, on average, lower than expected under the normal approximation. We are underestimating the causal effect. - This is called **weak instrument bias** because it occurs when the effect of the instrument on the exposure is small compared to the estimation error of `\(\hat{\beta}_{AZ}\)`. --- # 3. Multiple IV Estimators --- ## Multiple Instruments - Both the 2SLS estimation strategy and the ratio strategy extend to multiple instruments. - Let `\(\mathbf{Z}_i\)` be a `\(k\)`-vector of instruments. - For 2SLS, we extend our set of linear models. `$$A_i = \beta_{A0} + \boldsymbol{\beta}_{AZ}^\top \mathbf{Z}_i + \epsilon_{A,i} \\\ Y_i = \beta_{Y0} + \gamma A_i + \epsilon_{Y,i}$$` - In the first stage, we estimate `\(\hat{\boldsymbol{\beta}}_{AZ}\)`. - In the second stage regress `\(Y\)` on `\(\hat{A} = \hat{\boldsymbol{\beta}}_{AZ}^\top Z\)`. - Angrist and Kruger (1991) also examine the problem using quarter of birth interacted with year-of-birth. + This gives multiple instruments could affect total years of education differently. --- ## Inverse Variance Weighted Regression - The extension of the ratio estimator to multiple instruments is called **inverse variance weighted regression** or IVW regression. - For each instrument, we can construct a ratio estimate `$$\hat{\beta}_{IV,1} = \frac{\hat{\beta}_{YZ_1}}{\hat\beta_{AZ_1}}, \dots, \hat{\beta}_{IV,K} = \frac{\hat{\beta}_{YZ_K}}{\hat\beta_{AZ_K}}$$` - We then construct an overall estimate as a weighted average `$$\hat{\beta}_{IVW} = \frac{\sum_{k=1}^K w_k \hat{\beta}_{IV,k}}{\sum_{k =1}^K w_k}$$` where `\(w_k\)` is the inverse of the (approximate) variance of `\(\hat{\beta}_{IV,k}\)`. - `\(\hat{\beta}_{IVW}\)` is asymptotically equivalent (but not numerically identical) to the 2SLS estimate. --- ## Inverse Variance Weighted Regression - IVW regression is usually used in the case when only the estimates `\(\hat{\beta}_{YZ_k}\)` and `\(\hat{\beta}_{AZ_k}\)` are available but the individual level data is not. - This is the case in Mendelian randomization which is coming up in the later half of this lecture. - If `\(\hat{\beta}_{YZ_k}\)` and `\(\hat{\beta}_{AZ_k}\)` are obtained from marginal regressions of `\(Y\)` on each instrument separately and `\(A\)` on each instrument separately, then our instruments need to be marginally approximately indpendent. - If we use dependent instruments, we will under-estimate the variance of `\(\hat{\beta}_{IVW}\)`. --- ## IVW Regression - For IVW regression, we need to choose the weights `\(w_j\)`. - The most common approach is to use `$$w_k = \frac{\sigma_{Y,Z_k}^2}{\hat{\beta}_{A,Z_k}^2} \approx Var(\hat{\beta}_{IV,k})$$` - This is the estimate of `\(Var(\hat{\beta}_{IV,k})\)` we would get using our previous formula but assuming that `\(\gamma = 0\)`. - Plugging in these weights gives us `$$\hat{\beta}_{IVW} = \frac{\sum_{k=1}^K \hat{\beta}_{YZ_k}\hat{\beta}_{AZ_k}\sigma^{-2}_{YZk}}{\sum_{k=1}^K \hat{\beta}_{AZ_k}\sigma^{-2}_{YZ_k}}$$` --- ## IVW as Regression of Summary Statistics - The estimator `$$\hat{\beta}_{IVW} = \sum_{k=1}^K \frac{\hat{\beta}_{YZ_k}\hat{\beta}_{A,Z_k}\sigma^{-2}_{Y,Zk}}{\hat{\beta}_{A,Z_k}\sigma^{-2}_{Y,Zk}}$$` is exactly the estimate we would get from regressing `\((\hat{\beta}_{Y,Z_1}, \dots, \hat{\beta}_{Y, Z_K})\)` on `\((\hat{\beta}_{A,Z_1}, \dots, \hat{\beta}_{A, Z_K})\)` with no intercept. - This makes sense: our linear model implies that `$$E[\hat{\beta}_{A, Z_k}] = \beta_{A, Z_k} \qquad\text{and}\qquad E[\hat{\beta}_{Y, Z_k}] = \gamma\beta_{A,Z_k}$$` - Based on these equations, we might estimate `\(\gamma\)` by substituting `\(\hat{\beta}_{A, Z_k}\)` for `\(\beta_{A, Z_k}\)` in the second equation and fitting a regression. --- ## IVW as Regression of Summary Statistics <center> <img src="img/10_ldl_cad.jpg" width="70%" /> </center> Ference et al, Journal of American College of Cardiology (2012) --- ## Expectation of IVW Estimator - With `\(K\)` IVs, the moments of the 2SLS and IVW estimators exist up `\(K-1\)`. - Zhao et al (2019) (and others) calculate an approximation of `\(E\left[\hat{\beta}_{IVW}\right]\)` for independent samples. - For simplicity, assume that `\(\sigma^2_{AZ_k} = \sigma^2_{A}\)` and `\(\hat{\sigma}^2_{YZ_k} = \sigma^2_{Y}\)` are constant over instruments. - Write `\(\hat{\beta}_{YZ_k} = \gamma \mu_k + \epsilon_{Y,k}\)` and `\(\hat{\beta}_{AZ_k} = \mu_k + \epsilon_{A,k}\)`. - With independent samples, `\(\epsilon_{Y,k}\)` and `\(\epsilon_{A,k}\)` are independent. --- ## Expectation of IVW Estimator $$ `\begin{split} E[\hat{\beta}_{IVW}] =& E\left[\frac{\sum_{k=1}^K \hat{\beta}_{Y,Z_k}\hat{\beta}_{A,Z_k}\sigma^{-2}_{Y,Z_k}}{\sum_{k=1}^K \hat{\beta}_{A,Z_k}\sigma^{-2}_{Y,Zk}}\right]\\ =& E\left[\frac{\sum_{k=1}^K \hat{\beta}_{YZ_k}\hat{\beta}_{A,Z_k}}{\sum_{k=1}^K \hat{\beta}_{A,Z_k}}\right] \approx \frac{E\left[\sum_{k=1}^K \hat{\beta}_{YZ_k}\hat{\beta}_{A,Z_k}\right]}{E\left[\sum_{k=1}^K \hat{\beta}_{A,Z_k}\right]}\\ \approx &\frac{\gamma \sum_{k = 1}^K \mu_k^2}{\sum_{k = 1}^K \mu_k^2 + \sum_{k = 1}^K E[\epsilon_{A,k}^2]} = \frac{\gamma}{1 + (p\sigma^2_{A})/\sum_{k = 1}^K \mu_k^2}\\ =& \frac{\gamma}{1 + 1/\kappa}& \end{split}` $$ where `\(\kappa = \frac{1}{p}\sum_{k = 1}^K \frac{\mu_k^2}{\sigma^2_{AZ_k}}\)` represents the average instrument strength. --- ## Expectation of IVW Estimator - Solving the expression from the previous slide, we find that `$$E[\hat{\beta}_{IVW} - \gamma] \approx \frac{-\gamma}{\kappa}$$` - `\(\kappa = \frac{1}{K} \sum_{k = 1}^K \frac{E[\hat{\beta}_{AZ_k}]^2}{\sigma^2_{A,k}}\)` represents average instrument strength. This can be estimated as the average chi-squared statistic minus 1. - Weak instrument bias is pushing the estimate closer to zero. - This is the same pattern we saw in the single instrument case. --- ## Expectation of IVW Estimator - When there is sample overlap, `\(\epsilon_{A,k}\)` and `\(\epsilon_{Z_k}\)` are not independent and there is an additional term in the expression for the expected bias. - In the numerator of the expression for `\(E[\hat{\beta}_{IVW}]\)` we have `$$E\left[\sum_{k=1}^K \hat{\beta}_{YZ_k}\hat{\beta}_{A,Z_k}\right] = \gamma \sum_{k = 1}^K \mu_k^2 + Cov(\epsilon_{A,k},\epsilon_{Z_k})$$` - Rearranging terms and combining with the previous expression we have `$$E[\hat{\beta}_{IVW} - \gamma] \approx \frac{-\gamma + \rho c}{\kappa}$$` - Where `\(\rho = Cor(\epsilon_{A,k},\epsilon_{Z_k})\)` and `\(c = \sigma_{Y}/\sigma_{A}\)`. --- ## Bias of the OLS Estimate - Let `\(\rho_{pop}\)` be the population correlation of `\(A\)` and `\(Y\)`. - The expectation of the OLS estimate regressing `\(A\)` on `\(Y\)` is $$ `\begin{split} &E\left[ \hat{\beta}_{OLS} \right] = \frac{Cov(A, Y)}{Var(A)} = \rho_{pop}\sqrt{\frac{Var(Y)}{Var(A)}} \equiv \rho_{pop} c_{pop}\\ &E\left[ \hat{\beta}_{OLS} -\gamma \right] = \rho_{pop}c_{pop} - \gamma \end{split}` $$ where `\(Var(Y)\)` is the residual variance of `\(Y\)`. --- ## Bias of the IVW Estimator - If each instrument explains a small amount of the variance of `\(A\)` and there is complete sample overlap then `\(\rho \approx \rho_{pop}\)` and `\(c \approx c_{pop}\)`. - So the bias of the IVW estimate is approximately equal to the bias of the OLS estimate divided by `\(\kappa\)`. - Weak instrument bias is in the direction of the unadjusted (confounded) OLS estimate. - We have derived these expressions for the IVW estimator but the same results can be found for the 2SLS estimator. --- ## Weak Instrument Bias - The table below shows simulation results from Burgess and Thompson (2011). - Relative bias is comparing the OLS and the IV estimate. - We can see that as predicted, relative bias depends on `\(1/F\)` (approximately `\(1/\kappa\)` in our notation). <center> <img src="img/10_bandt_tab2.png" width="80%" /> </center> --- ## Weak Instrument Bias in the Education Example - Bound, Jaeger, and Baker (1993 and 1995) point out that after adjusting for age, age squared, and year of birth the F-statistic for the quarter of birth IVs is actually quite small. - A small amount of confounding between years of education and wages or slight violations of the exclusion restriction, could account for the results observed by Angrist and Kruger. --- ## Weak Instrument Bias in the Education Example <center> <img src="img/10_bound_tab1.png" width="90%" /> </center> --- ## Weak Instrument Bias as Regression Dilution - Recall that the IVW estimator is a regression of `\(\boldsymbol{\hat{\beta}}_Y\)` on `\(\boldsymbol{\hat{\beta}}_A\)`. - However, our equation was `\(E[\hat{\beta}_{YZ_k}] = \gamma \beta_{AZ_k}\)`. We substituted the hat value for `\(\beta_{AZ_k}\)`. - Bias arises because `\(\hat{\beta}_{AZ_k}\)` are measured with error. - Measurement error in a predictor will attenuate the estimated coefficient. --- ## Regression Dilution - Uncertainty in `\(\hat{\beta}_{AZ}\)` leads to regression dilution, causal estimate biased towards the null. <center> <img src="img/10_regression_dilution.gif" width="70%" /> </center> Animation from Robert Östling --- ## Bias in the IV Estimate <center> <img src="img/10_bdt_fig1.png" width="115%" /> </center> --- ## Selection Bias - If we test our potential instrument and find that our estimated F statistic is small, we will probably reject it as an instrument. - This means that on average, our estimate of `\(\hat{\beta}_{AZ}\)` will tend to be too extreme (far from 0). - Overestimating the magnitude of `\(\hat{\beta}_{AZ}\)` will lead us to understimate the magnitude of `\(\gamma\)`, biasing results towards the null. --- ## Selection Bias - This problem is bigger in settings where there are many possible instruments to choose from and selection is required. + In these circumstances, three sample IV is sometimes suggested. + We use one sample with measurements of `\(Z\)` and `\(A\)` to select instruments, and a second sample with measurements of `\(Z\)` and `\(A\)` to estimate `\(\hat{\beta}_{AZ}\)`. - Bias due to winner's curse tends to be small relative to weak instrument bias and is smaller for more stringent significance cutoffs. --- ## Bias in IV Estimates Summary - Using 2SLS or IVW in a single sample, bias due to weak instruments will be towards the confounded population correlation. - In estimates from separate samples, weak instrument bias will bias the estimate towards the null. - Selection bias will bias results towards the null but is smaller than weak instrument bias. - This is all bias that occurs *when all of the assumptions are satisfied*. - Violations of exchangeability or the exclusion restriction introduce correlation between `\(Z\)` and `\(\epsilon_{Y}\)`. If this occurs, bias could be in any direction but will most often be similar to the observational bias. --- ## Pros and Cons of Using Multiple Instruments Pros: - We have seen that our estimators do not have finite moments for single instruments. - This makes it appealing to use multiple instruments whenever possible. - Adding instruments will increase the total F statistic, which we have seen will decrease bias. Cons: - The more instruments we include, the more chances we have to vioalte one of the IV assumptions. + Valid inference depends on all instruments being valid. - In more non-parametric settings, using multiple instruments can make interpretation hard. --- ## Multiple Instrument Interpretation - Each of the ratio estimates is an estimate of the complier average causal effect. - However, different instruments have different complier groups. - In Angrist and Kruger, all instruments related to quarter of birth so plausibly, the "complier" group associated with each instrument represent similar populations. - If we are not willing to accept a model in which the effect is homogeneous, or there are no non-compliers, we are now estimating a weighted average of LATEs applying to different sub-groups. - However, if we believe that the sign of the causal effect is the same in all complier groups, we still have a valid test of the strict null and the sign of the estimated effect is meaningful. --- # 4. Mendelian Randomization --- ## Mendelian Randomization - In Mendelian randomization, genetic variants are used as instruments. -- + Genetic variants are fixed at conception. They can't be altered by any confounders that occur after that point. + No arrows int `\(Z\)` from environment. -- + Genetic variants can alter traits like height or disease risk by changing proteins, changing protein levels, or regulating expression of other genes. + Relevance can be satisfied -- + If we are willing to assume random mating with respect to the instruments at hand, then an individual's genetic variants are perfect randomizations. + No associations with confounders --- ## Mendelian Randomization - MR is incredibly powerful because it can be applied to any pair of traits that has been studied in genetic association studies. - Using summary statistic methods, MR can be applied even when individual level data are not available. - However, there are major caveats to results obtained using MR. --- ## Estimation Problems in MR - Weak instruments: Most variants explain only a tiny amount of trait variation. + Additionally, we are often trying to identify instruments in the same data we will use to estimate `\(\hat{\beta}_{AZ}\)`. + This can create selection bias. -- - Violations of the exclusion restriction. Some variants causally effect multiple traits. - Also, genetic variants are correlated with each other, so one variant may be correlated with separate causal variants for two different traits. -- - Confounding from population structure and assortative mating. + We generally try to adjust for this in the regression of `\(A\)` on `\(Z\)`. --- ## Interpretation Problems in MR - Complier groups are unknown and hard to define. We generally don't know the mechanism of most variants. - We generally assume that everyone is a complier for all instruments. - The exposure can be ill-defined. - We don't know *when* a variant affects a trait so we cannot differentiate short and long term exposure. + We may know that genetic changes altering `\(A\)` increase disease risk `\(Y,\)` but does that mean that if we pharmaceutically alter `\(A\)` later in life we can prevent `\(Y\)`? - Variants may be affecting different components of an overly broad exposure, e.g. very large LDLs vs large LDLs. --- ## MR Solutions - Despite its problems, MR has one big resource -- lots and lots of genetic variants. - The exposure trait may have thousands of causal variants, so thousands of potential instruments. - One strategy is to assume that most but not all of the instruments are valid. - We can then examine the distribution of ratio estimates and reject variants that look very different from the rest. - Another option is to use a robust regression rather than OLS for the regression of `\(\boldsymbol{\hat{\beta}}_Y\)` on `\(\boldsymbol{\hat{\beta}}_A\)`. + E.g median or mode regression - There are many many interesting methods trying different variations of this or related strategies. --- ## Accounting for Violations of the Exclusion Restriction - In MR, violations of the exclusion restriction (also called pleiotropy) are the biggest concern. - A simple extension of the SEM we have been working with allows for some violations. `$$A_i = \beta_{A0} + \boldsymbol{\beta}_{AZ}^\top \mathbf{Z}_i + \epsilon_{A,i}$$` `$$Y_i = \beta_{Y0} + \gamma A_i + \boldsymbol{\alpha}^\top Z_i + \epsilon_{Y,i}$$` - The presence of the `\(\boldsymbol{\alpha}^\top Z_i\)` term in the second equation is a violation of the exclusion restriction. - As written, the parameters in this new model are not identifiable. We have to make some restrictions on `\(\boldsymbol{\alpha}\)`. --- ## Egger Regression - Our extended model implies that, if `\(Z_1, \dots, Z_K\)` are independent, `$$E[\hat{\beta}_{YZ,k}] = \alpha_k + \gamma \beta_{AZ,k}$$` - IVW regression, regresses `\(\hat{\beta}_{YZ}\)` on `\(\hat{\beta}_{AZ}\)` with no intercept. - Egger regression extends this strategy to add an intercept, fitting `$$E[\hat{\beta}_Y] = \alpha_0 + \gamma \hat{\beta}_A$$` - This strategy is valid if, either `\(\boldsymbol{\alpha} = \alpha_0 \mathbf{1}_{K}\)` or `\(\sum_{k = 1}^K \alpha_k \beta_{AZ,k} = 0\)` + If we have a large number of instruments and think of `\(\beta_{AZ,k}\)` and `\(\alpha_k\)` as random, we require that `\(Cov(\beta_{AZ}, \alpha) = 0\)`. - This assumption says that effects of instruments not mediated by `\(A\)` are independent of the effects of instruments on `\(A\)`. + This is called the Instrument Strength Independent of the Direct Effect (InSIDE) assumption --- ## Violations of InSIDE - Violations of the InSIDE assumption occur when some instruments affect `\(A\)` *through* a confounder of the exposure and the outcome. <center> <img src="img/10_scatter2.png" width="95%" /> </center> --- ## Median Regression - An alternative strategy proposed by Bowden et al (2016) is to assume that most instruments are valid. - Rather than use the IVW estimator which averages the `\(K\)` ratio estimates, we take the median of the ratio estimates. <center> <img src="img/10_median.png" width="95%" /> </center> --- ## Outlier Robust Regression - There are several variations of this strategy. - Modal regression: use the mode of the ratio estimates. - Outlier detection: use a strategy to identify outliers and discard them. - Robust regression: Use an alternative loss function such as Huber loss to fit the regression. <center> <img src="img/10_mrpresso.png" width="45%" /> </center> --- ## Mixture Models - Finally, another alternative is to assume that there are two or more groups of instruments. - Instruments are grouped by their latent mechanistic relationship to `\(A\)`. - Instruments with the same mechanistic relationship should have similar ratio estimates. - We assume that the largest group of instruments are valid. <center> <img src="img/10_mrpath.png" width="50%" /> </center> --- ## MR Next Directions - MR is a very exciting field with many new developments happening every year. Major fields of development in MR include: - De-biasing strategies: How to reduce weak instrument bias and selection bias. - Improving robustness to violations of the no horizontal pleiotropy assumption. - Multivariable MR: Estimating effects of many exposures simultaneously. - Network MR: Using MR to infer causal networks for many traits. - MR for estimating non-linear relationships. --- ## MR in Action: Alcohol and Stroke - Observational studies show a J-shaped relationship between alcohol consumption and heart disease. - This relationship could be confounded by socioeconomic factors. - Millwood et al. perform MR using data from the China Kadoorie Biobank to estimate the effect of increased alcohol consumption on stroke, intracerebral haemorrhage, and myocardial infarction. - Two genetic variants explain a large proportion of the variation in alcohol consumption among men in the study. - Authors create a 6 lelel instrument based on genotypes and geographical region and use a two stage regression approach. - Only 2% of women in the study report drinking in most weeks. This group can be used as a negative control. --- ## MR in Action: Alcohol and Stroke <center> <img src="img/13_alc_stroke.jpg" width="70%" /> </center> --- ## MR in Action: Alcohol and Stroke <center> <img src="img/13_alc_mi.jpg" width="50%" /> </center> --- ## MR in Action: Alcohol and Stroke - Millwood et al. conclude that the J-shaped relationship seen observationally is a result of socioeconomic confounding and reverse causation. - They find a monotonic dose-response relationship between increased alcohol consumption and increased risk of stroke and intracerebral haemorrhage. - They find no relationship between alcohol consumption and myocardial infarction. - In this study, they are able to test the exclusion restriction by verifying that there is no relationship between the genetic/geographic risk categories and stroke/haemorrhage/MI risk. --- ## MR in Action: Covid-19 and Vitamin D - Early in the Covid-19 pandemic, vitamin D was suggested as a protective factor based on country level and patient level observational data. - Multiple MR studies have subsequently found no relationship between genetically predicted vitamin D levels and Covid-19 infection or hospitalization. - This does not rule out the possibility that acute changes in vitamin D levels near the time of exposure affect infection risk. --- ## MR in Action: Lifecourse MR - Richardson et al. use MR to distinguish effects of childhood and adult BMI on risk of CHD, T2D, and breask cancer. - They use multivariable MR and variants associated with childhood BMI, adult BMI, or both as instruments. - Results suggest no direct effect effect of childhood BMI (i.e. not mediate by adult BMI) on risks of CHD and T2D. - They do find a direct protective effect of higher childhood BMI on risk of breast cancer. --- # 5. Reassessing Assumptions the SEM --- ## Reassessing Assumptions the SEM `$$A_i = \beta_{A0} + \boldsymbol{\beta}_{AZ}^\top \mathbf{Z}_i + \epsilon_{A,i} \\\ Y_i = \beta_{Y0} + \gamma A_i + \epsilon_{Y,i}$$` - We started this section with a very strong model that required complete homogeneity and linearity of effects for the `\(A\)` and `\(Y\)` models. - It turns out that we can relax this model. --- ## Assumptions for the `\(A-Z\)` Relationship `$$A_i = \beta_{A0} + \beta_{AZ}^\top \mathbf{Z}_i + \epsilon_{A,i} \\\ Y_i = \beta_{Y0} + \gamma A_i + \epsilon_{Y,i}$$` - The 2SLS and IVW estimation strategies are valid even if the first equation is not structural. - `\(A\)` and `\(Z\)` do need to be linearly associated. - i.e. `\(Cov(Z, \epsilon_A) = 0\)` --- ## Assumptions for the `\(A-Y\)` Relationship `$$A_i = \beta_{A0} + \beta_{AZ}^\top \mathbf{Z}_i + \epsilon_{A,i} \\\ Y_i = \beta_{Y0} + \gamma A_i + \epsilon_{Y,i}$$` - The second equation does need to be structural, however, it is sufficient that `$$E[Y(a)] = \beta_{Y0} + \gamma a$$` - The requirement that `\(Cov(Z, \epsilon_Y) = 0\)` can't be relaxed. + This assumption ensures that the exclusion restriction and exchangeability hold. --- ## Consequences of Non-Linearity - Non-linearity could occur in either the `\(A-Z\)` relationship or in the `\(Y(a)-a\)` relationship. - If the relationship betwen `\(E[Y \vert Z]\)` is non-linear in `\(Z\)`, the first stage regression is mis-specified. + We will have a violation of the requirement `\(Cov(Z, \epsilon_A) = 0\)`. + A non-linear relationship between `\(Z\)` and `\(A\)` has the same effect as confounding between `\(Z\)` and `\(A\)`. + Our estimate of `\(\hat{A}\)` will be bad. - The good news is we have a chance to correct this. + Using model diagnositcs, we can evaluate the linearity assumption. + We could additional terms, like a quadratic term, to the regression. + Or we could use a flexible method like smoothing splines. --- ## Non-Linear Exposure Outcome Relationship - `\(E[Y(a)] = g(a)\)` is non-linear in `\(a\)`, then the 2SLS estimate does not estimate the ATE. - However, the parameter that we do estimate is not meaningless. - If every IV is binary and has equal effect size, `\(\gamma\)`, then we estimate the local average causal effect `$$LACE(a) \frac{E[Y(a + \gamma) - Y(a)]}{\gamma}$$` - There are IV extensions that allow us to estimate the non-linear relationship between `\(Y(a)\)` and `\(a\)`.