Deleted:No GC-rich reads after PCR amplification results in gaps in alignment?
0
0
Entering edit mode
2.9 years ago
mikegrootde ▴ 20

Hi all,

I am interested in short CG-trinucleotide Tandem Repeats (i.e. CGG, CGC, GGC etc.) Copy Number 2-30 (max length for a full repeat around 100 bp).

I downloaded some short read WGS data from primates and want to use these to determine CGG repeat lengths. The data is 25x coverage, 100 bp Paired-end reads. Libraries were prepared with a PCR protocol. I mapped the reads to Clint_PTRv2/panTro6 with bwa mem.

When I look at the mapping, I notice that there are no reads aligned near the CGG triplet repeat regions. enter image description here

I looked at a sample from the Human Genome Diversity Project which was already aligned to hg38, also by bwa mem. The sample was sequenced using PCR library, 31x coverage, short paired end reads. In this sample however, I do not observe gaps in the alignment at CGG triplet repeat regions.

Is this due to a difference in PCR protocol? Difference in coverage? Different alignment settings?

I've read about PCR bias but it seems just so unlikely to me that there is zero GC-rich templates that would not even be a little bit amplified! I am only interested in short repeats up until 100 bp and most repeats are very unpure e.g. have lots of non C or G substitutions. At least it is not a 100% GC content so I expected some GC-rich repeats surviving the PCR amplification and ending up as reads. Is it normal for a PCR protocol to leave these gaps in GC-rich regions? Or is it my alignment?

I am quite new with bioinformatics so I'd very much like to be sure of this and learn from it.

Is this what I should have expected or can there still be something done?

PCR GC WGS • 306 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 2657 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6