Abnormal GC content in Whole Exome Sequencing Sample after alignment
0
0
Entering edit mode
3.0 years ago
Leafou • 0

I have 87 Whole Exome Sequenced samples (Agilent SureSelect Exome v7 library and NovaSeq sequencer). The Illumina adaptater and the small reads (<30pb) were removed with CutAdapt.

For fastq QCs, my only problem is with %GCs. Here is the multiQC result after a fastQC:

enter image description here

Even though I have 8 bad samples in red, the majority of the samples are approximately centered around 50%GC. (I assume that both bumps are due to errors during library preparation or sequencing?)

However, my main concern is after the alignment with BWA. I obtained this figure for the GC content : enter image description here

I have one peak at 70% and another one around 90%, which is really problematic.

The HSMetrics showed that I have approximatively 85% of bases aligned on baits (so 15% bases that are off-bait).

When I tried to locate these GC-rich reads I usually fall in intronic or intergenic regions. However sometimes I fall at the end of exons, as with this example:

enter image description here

Do you have an idea about how to remove these reads?

Thank you for your help.

library WES GC Sequencing FASTQC • 1.5k views
ADD COMMENT
0
Entering edit mode

my main concern is after the alignment with BWA

Are you aligning to the entire genome?

ADD REPLY
0
Entering edit mode

Yes, the alignment was against the hg19 genome version.

ADD REPLY
0
Entering edit mode

Did you find any solution to this?

ADD REPLY

Login before adding your answer.

Traffic: 2426 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6