Mendelian Randomization

Mendelian randomization (MR) is an increasingly popular technique for inferring causal effects between traits that cannot be (or have not been) studied using randomized trials. MR is a variation of instrumental variable analysis in which genetic variants are used as instruments. MR is incredibly appealing because it promises information about causal effects using only summary genetic association data which is now publicly available for a cornucopia of traits. Unfortunately, the assumptions that MR makes about the variants used are quite strict, difficult to verify, and often violated. We developed a Bayesian approach to MR that is robust to a wider range of violations that existing methods. In particular, most existing methods assum that no variants acto on both traits through a common mechanism (we refer to this as correlate pleiotropy). Our proposal, CAUSE, is able to account for a proportion of correlated pleiotropic variants which helps it avoid false positives.

traditional mr assumptions

You can read a much longer introduction on the project website: as well as looking at some results for real traits and walking through a software tutorial.

R-package

paper

biorxiv version

Assocations with spatially structured traits

Sequencing based assays used to measure epigenomic features tend to produce very dense spatially structured measurements like these bisulfite sequencing data from ENCODE:

bisulfite sequencing data in a small region of chromosome 22

We call data generated by these assays “genomic phenotypes”. Often we are interested in identifying regions of the genome where a genomic phenotype is associated with an experimental condition or an organismal level trait (like height). I’ve worked on two different approaches to this problem.

In both of these projects, our motivations come from genomic data but there are lots of other spatially structured data types, like time-series or neuroimaging data, that these approaches might be applied to.

Flexible Robust Excursion Test (FRET)

FRET is a method for testing associations with genomic phenotypes that can both adaptively learn the boundaries of associated regions and control the false discovery rate. FRET is based on the idea of an excursion test in which a test statistics are computed at many closely spaced positions and then smoothed. Associated regions are then identified as regions in which the smoothed test statistic excedes some threshold. This project is in progress!

fret cartoon

R-package

Joint Adaptive Differential Estimation (JADE)

JADE is a graphical/descriptive tool for identifying associations between one spatially structured trait and one categorical trait. We use penalized likelihood to estimate smooth profiles borrowing both across positions and catgories.

jade cartoon

Paper

pdf/supplement

R-package

Rank Conditional Coverage

Investigations using “big data” (or even moderate data) often involve computing a large number of estimates for similar parameters. For example, in GWAS, we compute the marginal association of each variant with the trait of interest. The next step is usually to select the parameters with the most significant/largest estimates and try to say something about them. Adjusting significance estimates for multiple comparisons is now common practice. However, it is less common to adjust point estimates and confidence intervals. The problem with unadjusted estimates is that the act of selection introduces bias (known as the winner’s curse). Parameters with the most significant estimates are more likely to be over-estimates of the true value than under estimates. A particular challenge with adjusting confidence intervals is that it is tricky to define a desirable coverage criterion in a selection setting. We introduced the concept of “rank conditional coverage” (RCC) and a bootstrapping based procedure for constructing intervals that control it. The guarantee provided by controlling the RCC at level α is that the top ranked (or rank $j$) parameter will be contained in its confidence interval 1-α% of the time. We argue that this is an appealing criterion when parameter selection is based on rank. Looking at this criterion reveals a disturbing property of several other confidence intervals, including the unadjusted marginal estimate — the top ranked parameter is almost never contained in its corresponding interval!

rank conditional coverage

Paper

pdf

R-package