GWAS data analysis strategy or pipeline
1
2
Entering edit mode
5.4 years ago
Shicheng Guo ★ 9.4k

Suppose I received 5000 case and 5000 control GWAS study (suppose it is exom-array), what kinds of analysis I can conducted to make full use of the genetic data? According to my current knowledge, It looks I need to do it like the following way and I hope to get some suggestion to make the analysis perfect:

  1. transfer exom-array plink format to VCF format

  2. transfer all the probes to Forward chain.

  3. PCA to remove population outlier

  4. send it to Michigan Imputation Server to do imputation and phasing

  5. do the statistic analysis with allele-base, genotype based- with different model: dominant, recessive and so on

  6. do compound hetero-zygote scanning, do epistasis test, do interaction test...

  7. do gene-based, pathway based analysis

  8. do genetic risk score associated analysis

  9. do biological validation

Any more suggestions??

  1. weighted burden tests
GWAS SNP-array Exom-array • 2.3k views
ADD COMMENT
1
Entering edit mode

Most of it depends on the aim of your project. What are you trying to achieve from your GWAS study ? Is there an aim to this project ?

ADD REPLY
0
Entering edit mode

No specific aim, just data mining. Get what we can get from this data and make full use of it.

ADD REPLY
3
Entering edit mode
5.4 years ago
Vivek ★ 2.7k

The phase and impute strategy would work if you have a genome wide array of markers. There won't be enough genome wide SNP coverage to impute accurately if all you have is an exome array. The other option is to go with weighted burden tests, finding something from them is a question of how much power you have from 5000 cases and 5000 controls - what you have left over after sample QC for admixture, kinship checks etc. and if your phenotype is binary or quantitative.

I'd suggest starting with a power analysis before spending time on crafting an analysis plan.

ADD COMMENT
0
Entering edit mode
  1. What's the aim to check admixture and kinship? remove them or what?
  2. phenotype will have binary and quantitative
  3. Power analysis is very good suggest!!
  4. You are right, exom-array don't have very good imputation. How much R2 is required? R2>0.9?
ADD REPLY
1
Entering edit mode

When you use a linear model to check for association (Y = XB + E) the core assumption is that the elements of Y are independent, that's why you check for kinship and admixture and remove any that violate those assumptions.

ADD REPLY
0
Entering edit mode

Interesting, so what's the best threshold for P_hat to be applied to remove the samples.

ADD REPLY
1
Entering edit mode

I'm not sure what you mean by p_hat, the admixture QC is done with some PCA analysis and a reference population like 1000 genomes. There must be a tutorial on biostars if you search for it. You remove any samples related more than third degree using KING for kinship analysis.

ADD REPLY

Login before adding your answer.

Traffic: 2350 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6