In this tutorial, we will use SeqSIMLA to simulate families with disease.
-
In this tutorial, we use Asian 500kb on chrom 1(download).
-popfile ASN_500k.bed.gz
-recfile ASN_500k.rec
-
In our example pedigree file(download), there are 20 families, with 1,380 people.
We randomly choose two persons from each of the pedigrees in the pedigree file, then put them in a proband file(download).
-famfile SAP.txt
-proband probands.txt
If you don't have a pedigree file, you can just use the option "default 3-generation families" to generate fixed 3-generation pedigrees.
see "Output Options: -fam number" in User Manual.
-
One replicate of simulated data.
-batch 1
-
Four options below are required in SeqSIMLA to generate the disease status by the prevalence model.
The --mode-prev tells SeqSIMLA to use the prevalence model.
The -prev 0.05 specifies the disease prevalence in the general population as 5%.
We select sites 1, 200, and 3000 as the disease sites, assuming the odds ratio is 1.2 for the three sites.
--mode-prev
-prev 0.05
-site 1,200,3000
-or 1.2
Simulate pairwise interactions for all possible pairs in the disease loci with the odds ratio 1.2
-i 1.2
Collect all file into a folder and placed in the same directory you run SeqSIMA
Execute the following command without interaction effects,
./SeqSIMLA -popfile data/ASN_500k.bed.gz -recfile data/ASN_500k.rec -famfile data/SAP.txt -proband data/probands.txt -folder test1 -header test -batch 1 -site 1,200,3000 --mode-prev -prev 0.05 -or 1.2
Execute the following command for interaction effects between two SNPs
./SeqSIMLA -popfile data/ASN_500k.bed.gz -recfile data/ASN_500k.rec -famfile data/SAP.txt -proband data/probands.txt -folder test1 -header test -batch 1 -site 1,200,3000 --mode-prev -prev 0.05 -or 1.2 -i 1.2
With our example files, this simulation would take about 150 seconds.
Notice: If you don't want to make the command yourself, we provide a generate command user interface on our website.