Skip to contents

Sample individual level data with joint effects matching a sim_mv object

Usage

resample_inddata(
  N,
  dat = NULL,
  genos = NULL,
  J = NULL,
  R_LD = NULL,
  af = NULL,
  sim_func = gen_genos_mvn,
  new_env_var = NULL,
  new_h2 = NULL,
  new_R_E = NULL,
  new_R_obs = NULL,
  calc_sumstats = FALSE
)

Arguments

N

Sample size, scalar, vector, or special sample size format data frame, see details.

dat

An object of class sim_mv (produced by sim_mv). If `dat` is omitted, the function will generate a matrix of genotypes only. If `dat` is provided, phenotypes for the traits in `dat` will also be included.

genos

Optional matrix of pre-generated genotypes. If genos is supplied, resample_inddata will only generate phenotypes.

J

Optional number of variants. J is only required if dat is missing.

R_LD

LD pattern (optional). See ?sim_mv for more details.

af

Allele frequencies. af is required unless unless genos is supplied.

new_env_var

Optional. The environmental variance in the new population. If missing the function will assume the environmental variance is the same as in the old population.

new_h2

Optional. The heritability in the new population. Provide at most one of new_env_var and new_h2.

new_R_E

Optional, specify environmental correlation in the new population. If missing, the function will assume the environmental correlation is the same as in the original data.

new_R_obs

Optional, specify overall trait correlation in the new population. Specify at most one of new_R_E or new_R_obs. If missing, the function will assume the environmental correlation is the same as in the original data.

calc_sumstats

If TRUE, associations between genotypes and phenotypes will be calculated and returned.

Details

This function can be used to generate individual level genotype and phenotype data. It can be used in three modes:

To generate genotype data only: No sim_mv object needs to be included. Supply only N as a single integer for the number of individuals, J for the number of variants, af, and R_LD if desired. All other parameters are not relevant if there is no phenotype, so if they are supplied, you will get an error. The returned object will include a N x J matrix of genotypes and a vector of allele frequencies.

To generate both genotype and phenotype data: Supply dat (a sim_mv object) and leave genos missing. N and af are required and all other options are optional.

To generate phenotype data only: Supply dat (a sim_mv object) and provide a matrix of genotypes to the genos argument. The number of rows in genos must be equal to the total number of individuals implied by N. So for example, if there are two traits with 10 samples each and no overlap, genos should have 20 rows. The R_LD and af arguments should contain the population LD and allele frequencies used to produce the genotypes. These are used to compute the genetic variance-covariance matrix. N and af are required and all other options are optional.

Examples

# Use resample_inddata to generate genotypes only
simple_ld <- matrix(0.5, nrow = 5, ncol = 5)
diag(simple_ld) <- 1
genos_only <- resample_inddata(N = 8,
                               J = 20,
                               R_LD = list(simple_ld),
                               af = rep(0.3, 5))
#> Loading required package: hapsim
#> Loading required package: MASS
#> Generating genotype matrix only.
# generate genotypes and phenotypes
dat <- sim_mv(N = 0,
              G = 1,
              J = 20,
              pi = 0.5,
              h2 = 0.05,
              R_LD = list(simple_ld),
              af = rep(0.3, 5))
#> SNP effects provided for 20 SNPs and 1 traits.
genos_and_phenos <- resample_inddata(dat = dat,
                                      N = 8,
                                      R_LD = list(simple_ld),
                                      af = rep(0.3, 5))
#> Generating both genotypes and phenotypes.
#> SNP effects provided for 20 SNPs and 1 traits.
#> Genetic variance in the new population differs from the genetic variance in the old population.
#> I will assume that the environmental variance is the same in the old and new population.
#> I will assume that environmental correlation is the same in the old and new population. Note that this could result in different overall trait correlations.
# generate phenos only
phenos_only <- resample_inddata(dat = dat,
                                genos = genos_only$X,
                                N = 8,
                                R_LD = list(simple_ld),
                                af = rep(0.3, 5))
#> Generating phenotypes only.
#> SNP effects provided for 20 SNPs and 1 traits.
#> Genetic variance in the new population differs from the genetic variance in the old population.
#> I will assume that the environmental variance is the same in the old and new population.
#> I will assume that environmental correlation is the same in the old and new population. Note that this could result in different overall trait correlations.