Simulate summary statistics from a specified factor structure
Usage
sim_lf(
F_mat,
N,
J,
h2_trait,
omega,
h2_factor,
pi_L,
pi_theta,
est_s = FALSE,
R_E = NULL,
R_obs = NULL,
R_LD = NULL,
af = NULL,
sporadic_pleiotropy = TRUE,
h2_exact = FALSE,
pi_exact = FALSE,
snp_effect_function_L = "normal",
snp_effect_function_theta = "normal",
snp_info = NULL
)
Arguments
- F_mat
factor matrix M by K (M = number of traits, K = number of factors)
- N
GWAS sample size. N can be a scalar, vector, or matrix. If N is a scalar, all GWAS have the same sample size and there is no overlap between studies. If N is a vector, each element of N specifies the sample size of the corresponding GWAS and there is no overlap between studies. If N is a matrix, N_ii specifies the sample size of study i and N_ij specifies the number of samples present in both study i and study j. The elements of N must be positive but non-integer values will not generate an error.
- J
Total number of SNPs to generate
- h2_trait
Heritability of each trait. Length M vector.
- omega
Proportion of trait heritability mediated by factors. Length M vector.
- h2_factor
Heritability of each factor. Length K vector.
- pi_L
Proportion of non-zero elements in L_k. Length K factor
- pi_theta
Proportion of non-zero elements in theta. Scalar or length M vector.
- est_s
If TRUE, return estimates of se(beta_hat).
- R_E
Correlation between environmental trait components not mediated by factors. M by M pd matrix.
- R_obs
Observational correlation between traits. M by M pd matrix. At most one of R_E and R_obs can be specified.
- R_LD
List of LD blocks. R_LD should have class
list
. Each element of R_LD can be either a) a matrix, b) a sparse matrix (classdsCMatrix
) or c) an eigen decomposition (classeigen
). All elements should be correlation matrices, meaning that they have 1 on the diagonal and are positive definite.- af
Optional vector of allele frequencies. If R_LD is not supplied, af can be a scalar, vector or function. If af is a function it should take a single argument (n) and return a vector of n allele frequencies (See Examples). If R_LD is supplied, af must be a vector with length equal to the size of the supplied LD pattern (See Examples).
- sporadic_pleiotropy
Allow sporadic pleiotropy between traits. Defaults to TRUE.
- h2_exact
If TRUE, the heritability of each trait will be exactly h2.
- pi_exact
If TRUE, the number of direct effect SNPs for each trait will be exactly equal to round(pi*J).
- snp_effect_function_L, snp_effect_function_theta
Optional function to generate variant effects in
L
ortheta
.snp_effect_function_L/theta
can be a single function or list of functions of length equal to the number of factors/number of traits.- snp_info
Optional
data.frame
of variant information to be passed to variant effect functions. IfR_LD
is specified,snp_info
should have number of rows equal to the size of the supplied LD pattern. Otherwisesnp_info
should haveJ
rows.
Details
This function will generate GWAS summary statistics for M traits with K common factors.
The matrix F_mat provides the effects of each factor on each trait, F_mat[i,j]
gives the effect of factor j on trait i. The rows of F_mat
will be scaled in order
to provide desired proportion of hertiability of each trait explained by factors but the relative
size and sign of elements within rows will be retained.
A random factor matrix can be generated using generate_random_F
(see Examples).
It is possible to supply a non-feasible set of parameters. Usually this occurs if the heritability of the factors is low but the heritability of the traits is high leading to a contradiction. The function will return an error if this happens.
Trait covariance: Each trait is composed of four independent components, the genetic component mediated by factors, the environmental component mediated by factors, the genetic component not mediated by factors, and the environmental component not mediated by factors. Therefore, the total trait covariance can be decomposed into the sum of four corresponding covariance matrices.
Cov(T) = Sigma_FG + Sigma_FE + Sigma_GDir + Sigma_EDir
We assume that all cross-trait genetic sharing is explained by the factors so that Sigma_GDir is diagonal.
Each factor is a sum of a genetic component and an environmental components and factors are independent
(both genetic and environmental components) are independent across factors. This means that
Sigma_FG = F S_FG F^T and Sigma_FE = F S_FE F^T where S_FG and S_FE are diagonal matrices. The parameter R_E specifies
the correlation of the residual environmental component (i.e. R_E = cov2cor(Sigma_EDir).
Alternatively, if R_obs
is specified, Sigma_EDir will be chosen to give the desired observational correlation.
In the returned object, Sigma_G
is equal to the sum of the two genetic covariance components and Sigma_E
is equal to the sum of the two environmental components.
R
gives the overall trait correlation matrix multiplied by the overlap proportion matrix,
which is equal to the correlation in the error terms of beta_hat
(See Examples).
Examples
myF <- generate_random_F(K = 3, M = 10, nz_factor = c(2, 3, 2),
omega = rep(0.8, 10),
h2_trait = rep(0.6, 10), pad = TRUE)
dat <- sim_lf(myF, N = 10000, J = 20000, h2_trait = rep(0.6, 10),
omega = rep(0.8, 10), pi_L = 0.1, pi_theta = 0.1)
#> SNP effects provided for 20000 SNPs and 10 traits.
myF <- diag(2)
N <- matrix(c(10000, 8000, 8000, 10000), nrow = 2)
R_E <- matrix(c(1, 0.6, 0.6, 1), nrow = 2)
dat <- sim_lf(F_mat = myF, N = N, J = 20000, h2_trait = rep(0.6, 2), omega = rep(1, 2), h2_factor = rep(1, 2),
pi_L = 0.1, pi_theta = 0.1, R_E = R_E)
#> SNP effects provided for 20000 SNPs and 2 traits.
dat$R
#> [,1] [,2]
#> [1,] 1.0000000 0.1846886
#> [2,] 0.1846886 1.0000000
cor(dat$beta_hat[,1]-dat$beta_joint[,1], dat$beta_hat[,2]-dat$beta_joint[,2])
#> [1] 0.1809276