Why do we have eQTLs?
May 7, 2018

GWAS Benchmarking Project

Genome-wide association studies have successfully found many links between genetic variation and phenotypes. However, over the past decade, the statistical and software tools used for doing these studies have substantially changed and improved.

Major improvements have taken GWAS from sampling looking at for significant differences in frequency of a genetic variants between cases and control to models that include covariates, principle components to account for population structure and finally linear mixed models.

With all the methods out there, a formally bench marking study needs to be carried out. This study will use simulated data where the underlying causal genetics is known. The study will run each of the commonly used GWAS methods on the simulated data. It will compare the true discovery rates, false discovery rates, family-wise error rates and other commonly used statistics across the methods.


Background reading:

  1. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531285/
  2. https://www.cell.com/ajhg/pdf/S0002-9297(17)30240-9.pdf
  3. http://rspb.royalsocietypublishing.org/content/royprsb/282/1821/20151684.full.pdf
  4. Simulate genetic and phenotype data under and assumed model and test each software tool (http://cnsgenomics.com/software/gcta/GCTA_UserManual_v1.24.pdf)


  1. Understand and be able to describe the basic linear model that relates genetic variation, commonly single-nucleotide polymorphisms (SNPs) to phenotypes
  2. Describe the basic principles of a GWAS study
  3. Describe common confounders in GWAS studies and how they are corrected for
  4. Explain why linear mixed model approaches outperform standard GWAS approaches
  5. Develop a script to simulate a GWAS study using real genotype data (from 1000 Genomes) and the GCTA software package for a specific effect size and heritability assuming a linear additive model and a known number of causal genetic variants
  6. Run the simulation script many times for each parameter setting and plot the resulting rates of interest: family-wise error rates, false discovery rates, true positive rates, false positive rates.


Image Credit: M. Kamran Ikram et alIkram MK et al (2010) Four Novel Loci (19q13, 6q24, 12q24, and 5q14) Influence the Microcirculation In Vivo. PLoS Genet. 2010 Oct 28;6(10):e1001184. doi:10.1371/journal.pgen.1001184.g001