Problems merging the NGS data of different sequencing platform(Illumina Hiseq2000 and X10)
0
1
Entering edit mode
7.0 years ago
Jay Chou ▴ 10

Hello there,I have about 500 samples seperately using Illumina Hiseq 2000 (240 individuals) and Illumina X Ten (260 individuals) for whole genome sequencing.

After mapping to the reference genome and got the bam files, I applied 2 methods (Genotype likelihood approach and single read sampling approach) for the PCA analysis, but the results were quite confused, those samples clustered by different platforms (Hiseq2000 clustered together, and also for the Hiseq X Ten) but not by the same population.

And I'm not sure why there're so big deviation about different platform for the PCA results. Anyone have such data dealing experiences please give us some suggestions for the analysis? And how to calibrate these bias?

next-gen sequencing genome SNP • 1.9k views
ADD COMMENT
0
Entering edit mode

How reliable are your variant calls? What coverage do you have?

If your calls are not very reliable (may be because the coverage is low), then the biases of different platforms would play a stronger role. Try to get a clean set of variants as possible. Which method did you use to call variants?

ADD REPLY
0
Entering edit mode

Hello, many thanks for your response! The sequnecing coverage of two platform is 5X, calling SNP use the GATK best practice pipeline and ANGSD genotype likelihood approach. I have one more question, if the sequencing coverage of two platform increased to 30X, can it be calibrate these platform bias?

ADD REPLY
0
Entering edit mode

I would expect that increasing the coverage (and the quality of calls) would reduce platform biases. With 5x you should be able to test this by analysing only those calls with high quality in both platforms. There must be good calls enough. You may want to restrict your analysis to homozygotes, which are easier to call. For example, get all sites covered by at least 8 reads in which all reads support the alternative allele. You can also look at calls shared by one platform and analyse their signatures (G>T, etc). Maybe this gives some hint.

ADD REPLY
0
Entering edit mode

Thanks a lot! I will make some attempts according to your suggestion.

ADD REPLY

Login before adding your answer.

Traffic: 2931 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6