Simulate multivariate GWAS Data with Specified Direct Effects
Source:R/sim_mv_determined.R
sim_mv_determined.Rd
Simulate multivariate GWAS Data with Specified Direct Effects
Usage
sim_mv_determined(
N,
direct_SNP_effects_joint,
geno_scale,
pheno_sd,
G = 0,
est_s = FALSE,
R_obs = NULL,
R_E = NULL,
R_LD = NULL,
af = NULL
)
Arguments
- N
GWAS sample size. N can be a scalar, vector, or matrix. If N is a scalar, all GWAS have the same sample size and there is no overlap between studies. If N is a vector, each element of N specifies the sample size of the corresponding GWAS and there is no overlap between studies. If N is a matrix, N_ii specifies the sample size of study i and N_ij specifies the number of samples present in both study i and study j. The elements of N must be positive but non-integer values will not generate an error.
- direct_SNP_effects_joint
Matrix of direct variant effects. Should be variants by traits.
- geno_scale
Genotype scale of provided effects. Either "allele" or "sd".
- pheno_sd
Phenotype standard deviation, a scalar or vector of length number of traits.
- G
Matrix of direct effects. Rows correspond to the 'from' trait and columns correspond to the 'to' trait, so
G[1,2]
is the direct effect of trait 1 on trait 2. G should have 0 on the diagonal. Be sure thatG
is on the same scale as the effect sizes.- est_s
If TRUE, return estimates of se(`beta_hat`). If FALSE, the exact standard error of `beta_hat` is returned. Defaults to FALSE.
- R_obs
Total observational correlation between traits. R_obs won't impact summary statistics unless there is sample overlap. See Details for default behavior.
- R_E
Total correlation of the environmental components only. R_E and R_obs are alternative methods of specifying trait correlation. Use only one of these two options. R_E may be phased out in the future.
- R_LD
Optional list of LD blocks. R_LD should have class
list
. Each element of R_LD can be either a) a matrix, b) a sparse matrix (classdsCMatrix
) or c) an eigen decomposition (classeigen
). All elements should be correlation matrices, meaning that they have 1 on the diagonal and are positive definite. See Details and vignettes.- af
Optional vector of allele frequencies. If R_LD is not supplied, af can be a scalar, vector or function. If af is a function it should take a single argument (n) and return a vector of n allele frequencies (See Examples). If R_LD is supplied, af must be a vector with length equal to the size of the supplied LD pattern (See Examples).
Value
A sim_mv
function. See ?sim_mv
for details.
Details
A wrapper for sim_mv
. See ?sim_mv
and the "Providing an Exact Set of Direct Effects" section of the Effect Size vignette.
Examples
G <- matrix(c(0, 0.5, 0, 0), nrow = 2, byrow =T)
my_effects <- matrix(0, nrow = 10, ncol = 2)
my_effects[c(1, 5),1] <- c(-0.008, 0.01)
my_effects[c(3, 6, 9), 2] <- c(-0.02, 0.06, 0.009)
my_effects
#> [,1] [,2]
#> [1,] -0.008 0.000
#> [2,] 0.000 0.000
#> [3,] 0.000 -0.020
#> [4,] 0.000 0.000
#> [5,] 0.010 0.000
#> [6,] 0.000 0.060
#> [7,] 0.000 0.000
#> [8,] 0.000 0.000
#> [9,] 0.000 0.009
#> [10,] 0.000 0.000
# for fun, lets include some sample overlap
N <- matrix(c(40000, 10000, 10000, 20000), nrow = 2)
sim_dat <- sim_mv_determined(N = N,
direct_SNP_effects_joint = my_effects,
geno_scale = "sd",
pheno_sd = 1,
G=G,
est_s = TRUE)
#> SNP effects provided for 10 SNPs and 2 traits.
sim_dat$direct_SNP_effects_joint
#> [,1] [,2]
#> [1,] -0.008 0.000
#> [2,] 0.000 0.000
#> [3,] 0.000 -0.020
#> [4,] 0.000 0.000
#> [5,] 0.010 0.000
#> [6,] 0.000 0.060
#> [7,] 0.000 0.000
#> [8,] 0.000 0.000
#> [9,] 0.000 0.009
#> [10,] 0.000 0.000
sim_dat$beta_joint
#> [,1] [,2]
#> [1,] -0.008 -0.004
#> [2,] 0.000 0.000
#> [3,] 0.000 -0.020
#> [4,] 0.000 0.000
#> [5,] 0.010 0.005
#> [6,] 0.000 0.060
#> [7,] 0.000 0.000
#> [8,] 0.000 0.000
#> [9,] 0.000 0.009
#> [10,] 0.000 0.000
sim_dat$Sigma_G
#> [,1] [,2]
#> [1,] 0.000164 0.000082
#> [2,] 0.000082 0.004122