class: center, middle, inverse, title-slide # L4: Selection Bias ### Jean Morrison ### University of Michigan ### 2022-24-10 (updated: 2022-01-26) --- `\(\newcommand{\ci}{\perp\!\!\!\perp}\)` # Example - We are interested in the effect of drug `\(A\)`, an HIV treatment on preventing disease progression. - We will use CD4 count as a measurement of disease progression. Our outcome `\(Y\)` is 1 if CD4 count falls below a threshold within one year of starting treatment. - Some patients drop out of the study before the one year mark and we cannot observe their outcome. - Patients may drop out if they are in poor health. This could occur if their disease progresses and their CD4 count drops or if they are experiencing side effects ( `\(L\)` ) from the treatment. - Use a variable `\(C\)` (for censoring) to represent if a patient drops out of the study before one year ( `\(C = 1\)` ) or not ( `\(C = 0\)`). - With a partner, draw a DAG representing this scenario. --- # HIV Treatment Example <center>
</center> --- # HIV Treatment Example - In this study, if we had been able to measure `\(Y\)` for all patients, could we identify the effect of `\(A\)` on `\(Y\)` -- i.e. would we have `\(Y(a) \ci A\)`? - With some data unobserved, can we identify the effect of `\(A\)` on `\(Y\)`? <center>
</center> -- - No. We have *selection bias*. - `\(C\)` is a collider on a path between `\(Y\)` and `\(L\)`. Conditioning on `\(C\)` induces a correlation between `\(Y\)` and `\(L\)` and therefore a non-causal association between `\(Y\)` and `\(A\)`. --- # HIV Treatment Example - Suppose now that treatment has no side effects. - Treatment can only influence selection *through* its effect on `\(Y\)`. <center>
</center> - Do we still have selection bias? --- # Example Continued - Suppose that `\(A\)` is effective and - Patients with `\(Y = 0\)` are more likely to remain in the study than patients with `\(Y = 1\)`. `$$P[Y = 1 \vert A = 0] = 0.8 \qquad P[Y = 1 \vert A = 1] = 0.1\\\ P[C = 0 \vert Y = 0] = 1 \qquad P[C = 0 \vert Y = 1] = 0.5$$` - With your partner, compute the average causal effect of `\(A\)` on `\(Y\)` and compute `\(E[Y \vert A = 1] - E[Y \vert A = 0]\)` in the sub-population with `\(C =0\)`. - Repeat your calculations assuming that there is no effect of `\(A\)` on `\(Y\)` and `\(P[Y = 1 \vert A =0 ] = P[Y = 1 \vert A =1] = p\)`. <center>
</center> --- # Example Continued - In the scenario with a treatment effect: <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Full Data</div></th> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">\(C = 0\)</div></th> </tr> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> \(A=0\) </th> <th style="text-align:right;"> \(A=1\) </th> <th style="text-align:left;"> </th> <th style="text-align:right;"> \(A=0\) </th> <th style="text-align:right;"> \(A=1\) </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> \(Y=0\) </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:right;"> 0.9 </td> <td style="text-align:left;"> </td> <td style="text-align:right;"> 0.33 </td> <td style="text-align:right;"> 0.95 </td> </tr> <tr> <td style="text-align:left;"> \(Y=1\) </td> <td style="text-align:right;"> 0.8 </td> <td style="text-align:right;"> 0.1 </td> <td style="text-align:left;"> </td> <td style="text-align:right;"> 0.67 </td> <td style="text-align:right;"> 0.05 </td> </tr> </tbody> </table> - The causal risk difference is 0.1-0.8 = -0.7. - The associational risk difference among those with `\(C = 0\)` is `\(0.05 - 0.67 = -0.62\)` --- # Example Continued - In the scenario with no effect: <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Full Data</div></th> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">\(C = 0\)</div></th> </tr> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> \(A=0\) </th> <th style="text-align:left;"> \(A=1\) </th> <th style="text-align:left;"> </th> <th style="text-align:left;"> \(A=0\) </th> <th style="text-align:left;"> \(A=1\) </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> \(Y=0\) </td> <td style="text-align:left;"> \(1-p\) </td> <td style="text-align:left;"> \(1-p\) </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> \(\frac{2-2p}{2-p}\) </td> <td style="text-align:left;"> \(\frac{2-2p}{2-p}\) </td> </tr> <tr> <td style="text-align:left;"> \(Y=1\) </td> <td style="text-align:left;"> \(p\) </td> <td style="text-align:left;"> \(p\) </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> \(\frac{p}{2-p}\) </td> <td style="text-align:left;"> \(\frac{p}{2-p}\) </td> </tr> </tbody> </table> - The causal risk difference is 0. - The associational risk difernce in the observed data is also 0. So there is no selection bias for the measure of association. - However, in the observed data, `\(E[Y \vert A= a, C = 0] \neq E[Y(a)]\)`. --- # Selection Bias Under the Null - In both examples we have selection bias because `\(C\)` is a common effect of both `\(A\)` and `\(Y\)`. - In the first case, this condition is true whether or not `\(A\)` has a non-zero effect on `\(Y\)` (*selection bias under the null*). - In the second case (all of the effect of `\(A\)` on `\(C\)` is mediated by `\(Y\)`), this condition *only* occurs when there is a non-zero causal effect of `\(A\)` on `\(Y\)`. - Selection bias under the null always implies selection bias in non-null settings. - The reverse is not true (as we have seen). --- # Colliding Creates Selection Bias - Conditioning on a variable that is a child of both the outcome and the exposure creates selection bias. - If the effect of the exposure on the selection variable is not entirely mediated by the outcome, we will have selection bias under the null. - Recall that conditioning on the child of a collider will also open a path. <center>
</center> --- # Selection Bias without Colliding - In our previous HIV treatment example, suppose that low CD4 count does not directly cause censoring. - Instead there is a variable `\(U\)` representing health which is a common cause of both `\(Y\)` and `\(S\)`. - We still have selection bias in this case, but `\(C\)` is not a descendant of `\(Y\)`. <center>
</center> --- # Selection Bias Definition - Selection bias occurs when we condition on a variable which is a common effect of two variables. - One must be either the treatment or *associated* with the treatment. - The other must be the exposure or *associated* with the exposure. --- # Selection Could Happen Before the Outcome <center>
</center> --- # Selection Could Happen Before the Exposure <center>
</center> <!-- --- --> <!-- # Extended Backdoor Criterion --> --- # Augmenting DAGs - The DAGs we have been drawing are harboring a hidden counterfactual. - The node `\(Y\)` could be written as `\(Y(C = 0)\)` -- the value of `\(Y\)` that we would observe if nobody was censored. - Our DAGs are missing the value of `\(Y\)` that we actually observe, `\(Y^{obs}\)` which is determined by `\(Y(C = 0)\)` and `\(C\)`. <center>
</center> --- # Adjusting for Selection - Under selection, we are interested in estimating the average of a joint counterfactual `\(E[Y(a, C = 0)]\)` -- the expected value of `\(Y\)` if everyone received treatment `\(A\)` and was not censored. - We can use a two stage weighting procedure. - First we re-weight the data to look like the world in which nobody was censored. For this we need a set of variables `\(L_1\)` such that `$$Y( C = 0) \ci C \vert L_1$$` - We use a second set of weights to account for any confounding. So we need a set of variables `\(L_2\)` such that `$$Y(a, C = 0) \ci A \vert L_2$$` - We will also need a modification of the positivity condition. --- # Adjusting for Selection - If `\(Y(C = 0) \ci C \vert L_1\)`, we can weight the data by `\(W^C = 1/P[ C = 0 \vert L_1, A]\)` - For example, if half the participants with `\(L = l\)` and `\(A = a\)` were censored, the remaining individuals will receive weight 2. - If we need to do further adjustment for confounding, we can compute weights `\(W^A = 1/P[A = a \vert L_2]\)` - Our total weights will be `\(W^C\cdot W^A\)`. --- # IP Weighting Example 1 - Using the example from HR (modified fig 8.3), data follow the DAG below: <center>
</center> - `\(L\)` represents pre-existing heart disease. - `\(A\)` is random assignment to a diet containing wasabi. - `\(Y\)` indicates death by the end of the trial. - Some participants are lost to follow-up ( `\(C = 1\)` ) due either to heart disease or the treatment assignment. - There is no effect of `\(A\)` on `\(Y\)`, though this is unknown to investigators. --- # IP Weighting Example 1 <center> <img src="img/4_comb_ipw1.png" width="80%" /> </center> - We must condition on `\(L\)` to block the path between `\(Y(C = 0)\)` and `\(C.\)` - `\(Y(a, C = 0)\)` is independent of `\(A\)` unconditionally. - Since there is no confounding, we only need to compute `\(P[C = 0 \vert L, A]\)` for all levels of `\(L\)` and `\(A\)`. --- # IP Weighting Example 1 - The table shows `\(P[C = 0 \vert A, L]\)` from the HR example. <center> <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> \(A=0\) </th> <th style="text-align:right;"> \(A=1\) </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> \(Y=0\) </td> <td style="text-align:right;"> 1.0 </td> <td style="text-align:right;"> 0.5 </td> </tr> <tr> <td style="text-align:left;"> \(Y=1\) </td> <td style="text-align:right;"> 0.6 </td> <td style="text-align:right;"> 0.2 </td> </tr> </tbody> </table> </center> - Individuals with `\(A = 0\)` and `\(L = 0\)` get weight 1 because they were never censored. - Individuals with `\(A = 1\)` and `\(L =1\)` get weight 5 because only 20\% of this strata were uncensored. --- # Positivity and Consistency - In order to use IP weighting, we need `\(P[C = 0 \vert A, L] > 0\)` in all strata of `\(A\)` and `\(L\)`. - We do not need `\(P[C = 1 \vert A, L] > 0\)`. - We also need the the counterfactual outcome `\(Y(a, C = 0)\)` to be well-defined. + If `\(C\)` is loss to follow-up, it makes sense to suppose that all patients were followed. - Suppose that `\(C\)` is censoring due to death resulting from causes other than `\(A\)`. + HR argue that it doesn't make sense to propose an intervention that that eliminates all other causes of death. --- # IP Weighting Example 2 - In the previous example, we also could have stratified on `\(L\)`. - This doesn't always work. <center>
</center> - In this case, stratifying by `\(L\)` induces confounding through the backdoor path `\(A \rightarrow L \leftarrow U \rightarrow Y\)`. - But weighting by `\(1/P[C = 0 \vert A, L]\)` works. --- # Sources of Selection Bias - Differential loss to follow-up: Participants may drop out of the study for reasons related to the treatment or outcome. - Non-response: Social stigmas may make people more likely to omit some kinds of information than others. - Self-selection/volunteer bias: Some individuals may be more likely to volunteer for a study than others. <!-- + For example, healthy people with a family history of cancer may be more likely to participate in a cancer study. --> <!-- + If the study is advertised in particular places (e.g. on public transport), some people will be more likely to know about the study than others. --> - Healthy worker bias: Participants for a study of an occupational exposure on an outcome are recruited from among those who are at work on the day the exposure is measured. + People may be more likely to miss work for reasons directly related to the outcome or for reasons that are associated with both outcome and exposure (e.g. SES). --- # Case-Control Studies - The graph from our first example could have described a case-control study. <center>
</center> - Individuals are selected into the study based on their value of `\(Y\)`. - In this case, we are no longer able estimate the average counterfactuals or the causal risk ratio. - However, in this DAG, we can estimate the causal odds ratio due to cancellation. --- # Case-Control Studies - Without censoring, `\(Y(a)\)` and `\(A\)` are exchangeable so `\(E[Y(a)]= E[Y \vert A]\)`. - We only get to observe `\(E[Y \vert A, C = 0]\)` $$ \text{OR} = \frac{P[Y = 1 \vert A ]}{P[Y = 0 \vert A ]}\\\ = \frac{P[Y = 1\vert A, C = 0]P[C = 0 \vert A]}{P[Y = 0\vert A, C = 0]P[C = 0 \vert A]} \\\ = \frac{P[Y = 1\vert A, C = 0]}{P[Y = 0\vert A, C = 0]} $$ - So the association odds ratio is equal to the causal odds ratio. <center>
</center> --- # Case-Control Studies - If `\(Y\)` is the only cause of selection, we can recover `\(E[Y(a)]\)` by using outside information. - If we know `\(P[Y = 1] = \alpha\)` in the target population, we can compute the value of `\(P[Y \vert A]\)` that we would have observed in the full population. $$ P[Y(a)=1] = P[Y =1 \vert A] = \frac{P[A \vert Y]P[Y = 1]}{P[A]} \\\ = \frac{\alpha P[A \vert Y = 1]}{\alpha P[A \vert Y = 1] + (1-\alpha)P[A \vert Y = 0]} $$ <center>
</center> --- # Selection Bias and Hazard Ratios - Suppose we have a single treatment `\(A\)` and then individuals are followed over time. - We are interested in estimating the counterfactual risk of death under treatment `\(a\)` (or the RR comparing treatment `\(a\)` and `\(a^\prime\)` ). - For simplicity, assume we have two discrete time points and know + `\(Y_1\)`: death by time point 1 + `\(Y_2\)`: death by time point 2 <center>
</center> --- # Hazard Ratios - In this DAG, we can estimate the total causal effect of `\(A\)` on both `\(Y_1\)` and `\(Y_2\)` since we have exchangeability for both. - In both cases, the causal risk ratio is equal to the association risk ratio. - The *hazard* at time 2 is the probability of dying by time 2 conditional on being alive at time 1 (for discrete time). - Based on our DAG, conditional on `\(Y_1\)`, there is no effect of `\(A\)` on `\(Y_2\)`. - However, conditioning on `\(Y_1\)` induces a non-causal association between `\(Y_2\)` and `\(A\)` through `\(U\)`. <center>
</center> --- # Hazard Ratios Example - Suppose that `\(U\)` is an indicator for being high-risk or low-risk. - Suppose that the treatment kills all high-risk individuals by time 1 and has no effect for low-risk individuals. - At time 2, the treatment group contains only low-risk individuals. The control group contains a mix of low and high-risk individuals. - So the hazard ratio at time 2 will be less than 1 even though the treatment is not beneficial for any patients. <center>
</center> --- # Non-Compliance - We perform a randomized trial of smoking cessation. - A population of current smokers with no immediate plans to quit are recruited. - Half the participants are assigned to quit smoking for six weeks, the other half are assigned to continue smoking as usual ( `\(Z\)` ). - We measure cardiovascular endurance at the beginning and end of the study. - Our outcome, `\(Y\)`, represents the change in endurance over 6 weeks. --- # Non-Compliance - Suppose that both treatment groups have some rate of non-adherence to the treatment plan. + There are some people who are assigned to quit and don't. + Some people assigned to continue smoking are inspired by their study particpation and decide to quit anyway. - Let `\(A\)` represent the actual treatment each person receives (quitting or not). - Let `\(U\)` be a confounder that affects both adherence and change in endurance. - Draw a DAG of this scenario. --- # Non-Compliance <center>
</center> - The blue arrow may exist if knowledge of the treatment assignment alters participants behavior. + People who are assigned to quit and don't may exercise more to "make up" for not quitting. - The blue arrow might be eliminated if it is possible to conceal the treatment from participants (*blinding*). --- # Non-Compliance - We would like to estimate `\(E[Y(a)]\)` and + `\(E[Y(A=1)] - E[Y(A = 0)]\)`, the *per-protocol (PP) effect*. + In this graph, the presence of `\(U\)` means that `\(E[Y(a)]\)` is not identifiable. - We can identify `\(E[Y(z)]\)`. + `\(E[Y(z =1)] - E[Y(z = 0)]\)` is the *intention-to-treat (ITT) effect*. <center>
</center> --- # Pros of the ITT - The ITT can be measured from the data without confounding and is therefore often preferred. - If we further assume that the blue arrow does not exist, then the following arguments are in favor of the ITT. - The ITT preserves the null: If there is no effect of `\(A\)` on `\(Y\)` then there is no effect of `\(Z\)` on `\(Y\)`. - If we further assume *monotonicity* ( `\(Y_i(1) \geq Y_i(0)\)` for all individuals `\(i\)` ), then the ITT effect is closer to zero than the PP effect, making the estimate conservative. <center>
</center> --- # Cons of the ITT - Conservativeness is not always good. + For example, if we are looking for adverse effects of a medication, a conservative estimate is dangerous. - If montonicity does not hold, the ITT may be anti-conservative: + Suppose individuals who benefit from the treatment are more likely to comply than individuals who would be harmed by it. - In some cases, assuming the blue arrow is not present is unreasonable. + If the blue arrow is present the ITT may differ from the PP in any direction. --- # "As-Treated" Analysis to Estimate the PP Effect - If we can measure confounding factors between `\(A\)` and `\(Y\)`, we can estimate the PP effect using IP weighting or standardization. - In this case we are treating our trial data like observational data. - This is the "as-treated" analysis. <center>
</center> --- # "Per-Protocol" Analysis to Estimate the PP Effect - Another commonly used alternative is to exclude all non-compliers from the analysis. - This approach introduces selection bias unless the the confounders `\(U\)` are measured. - So either way, we need to measure `\(U\)`. - Later we will see an alternative method, instrumental variable analysis, which requires some additional assumptions. <center>
</center> --- # Measurement Error - The non-compliance problem is similar to a measurement error problem. + `\(Z\)` is like a mis-measured version of `\(A\)`. - More generally, measurements of `\(A\)`, `\(Y\)`, or other variables inaccurate. - We won't cover methods for accounting for measurment error, but it is important to be aware of. <center>
</center> --- # Sources and Types of Measurement Error - Measurement error is *independent* if `\(U_A\)` is independent of `\(U_Y\)` - Measurement error might be non-independent if `\(A\)` and `\(Y\)` are collected in the same survey. - Measurement error is *non-differential* if `\(U_A\)` and `\(U_Y\)` are independent of `\(A\)` and `\(Y\)`. - Differential error could occur in self-reported variables if participants tend to mis-report high or low values of `\(A\)` or `\(Y\)`. - `\(Y\)` might effect `\(U_A\)` if `\(A\)` is measured after some effect of `\(Y\)` has already occurred, creating the appearance of reverse causation. <center>
</center>