GWAS data from an Illumina Omni express Array and Illumina 660 W Quad Array
1
0
Entering edit mode
8.2 years ago
Sheila ▴ 420

Is it possible to run a gwas analysis where half of the subjects has GWAS data from an Illumina Omni express array and the other half of the subjects have GWAS data from an Illumina 660 W Quad Array?

What are the necessary steps required to include both of these data in a complete analysis - in terms of combining these groups?

Thanks!

illumina data gwas • 3.5k views
ADD COMMENT
6
Entering edit mode
8.2 years ago
LauferVA 4.2k

This is a very large question with no simple answer.

Here is what you should do:

  1. Google "GWAS quality control"
  2. Start reading papers like this one from Stephen Turner: "Quality Control Procedures for GWAS" http://www.ncbi.nlm.nih.gov/pubmed/21234875
  3. As you read these papers (there are a couple dozen that will help you) start to take notes on what kinds of things they recommend. For instance, you will want to do QC by variant, by sample (individual person), by batch or plate, and by chip. Take notes on each of those.

Once you have a command of the literature, construct something like this:

I. Initial processing of new data

  1. Genotype Calling (Illuminus)
  2. X an Y probe intensity, Structural Variation (Illumina Bead Studio)
  3. Coversion to bed bim fam (Custom, PLINK)

II.*Sample QC*

  1. Sex Check (PLINK)
  2. Missingness Outliers (PLINK)
  3. Heterozygosity Rate Outliers (PLINK)
    1. Calculate observed heterozygosity per individual
    2. Plot Missingness on X axis, Heterozygosity on Y. Decide reasonable thresholds for exclusion
  4. Relatedness Checks
    1. Prune out high LD regions (e.g., HLA)
    2. Prune down to 50,000 high quality, LD-independent SNPs
    3. Check for IBD > 0.185, visualize (PLINK, R (turner))
    4. Mark or exclude
  5. Ancestry Checks (PLINK, smartPCA, R scripts)
    1. Extract SNPs not featured in Hapmap 3 Rel. 2 four ancestral populations
    2. Merge with hapmap data, flipping hapmap strand
    3. PCA on merged file
    4. Plot PC loadings
    5. Determine all PCs having significant correlation to ancestry (R)
    6. Exclude ancestry outliers (R)
  6. Per Chip comparisons on a.-d. (Custom)
  7. Exclude or mark all sample outliers

III. Marker QC

  1. Excessive Missingness (PLINK)
    1. Select threshold based on visual inspection of histogram
  2. HWE (PLINK)
    1. If a higher threshold is chosen, manually inspect cluster plot
  3. Differential Missingness Check (PLINK)
    1. Informative Missingness - CNV
    2. Consecutive Missingness in a stretch
  4. Low MAF (PLINK)
  5. Internal Sample Reproducibility (Between Chips) (PLINK)
  6. External Sample Reproducibility (HapMap Concordance) (PLINK)
  7. Per Chip Call Rate, AF, GF, comparisons on a.-d. (Custom)

IV. Batch Effects

  1. Average MAF (PLINK, Custom)
  2. Average call rates (PLINK, Custom)
  3. Association Testing by plate (remove MAF <5%) (Custom, PLINK)
  4. Correction via population stratification techniques if necessary

V. Dataset Merging and Harmonization

  1. Sample Checks
    1. Must perform same checks as before on merged set.
    2. Results should confirm previous relationships, find new related pairs.
  2. HWE - after merging, high number of SNPs out of HWE due to differences in ancestry.
    1. Need to stratify by ethnicity, then look for HWE outliers p < 0.0001.
  3. Population Stratification
    1. Use AIMs from Dumitrescu 2010
  4. Marker Checks
    1. After removing 95% from single study, second check for 99% overall.
  5. Batch Effects
    1. Test independence of AF with plate membership, and compare the distribution of chi-square statistics to the null distribution.
  6. Merging

VI. Integrated imputation, phasing, and strand flipping

  1. Genotype Harmonizer
    1. Across Study-Side Hapmap sample Concordance (GH)
    2. Inspect original source file designation (GH)
    3. MAF comparisons (GH)

VII. Association Testing

  1. Post QC PCA
  2. Decide between Logistic Regression and Mixed Modelling
    1. Degree of Relatedness

VIII. Evaluation of QC Quality after Association Analysis

  1. Calculation of Lambda
  2. Examination of Intensity Plots
  3. Replicate SNPs of interest on a DIFFERENT Technology
ADD COMMENT

Login before adding your answer.

Traffic: 2984 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6