Abnormal GC content in Whole Exome Sequencing Sample after alignment

0

Entering edit mode

3.0 years ago

Leafou • 0

I have 87 Whole Exome Sequenced samples (Agilent SureSelect Exome v7 library and NovaSeq sequencer). The Illumina adaptater and the small reads (<30pb) were removed with CutAdapt.

For fastq QCs, my only problem is with %GCs. Here is the multiQC result after a fastQC:

enter image description here

Even though I have 8 bad samples in red, the majority of the samples are approximately centered around 50%GC. (I assume that both bumps are due to errors during library preparation or sequencing?)

However, my main concern is after the alignment with BWA. I obtained this figure for the GC content : enter image description here

I have one peak at 70% and another one around 90%, which is really problematic.

The HSMetrics showed that I have approximatively 85% of bases aligned on baits (so 15% bases that are off-bait).

When I tried to locate these GC-rich reads I usually fall in intronic or intergenic regions. However sometimes I fall at the end of exons, as with this example:

enter image description here

Do you have an idea about how to remove these reads?

Thank you for your help.

library WES GC Sequencing FASTQC • 1.5k views

ADD COMMENT • link updated 2.5 years ago by sadikshaadhikari1 • 0 • written 3.0 years ago by Leafou • 0

0

Entering edit mode

my main concern is after the alignment with BWA

Are you aligning to the entire genome?

ADD REPLY • link 3.0 years ago by GenoMax 141k

0

Entering edit mode

Yes, the alignment was against the hg19 genome version.

ADD REPLY • link 3.0 years ago by Leafou • 0

0

Entering edit mode

Did you find any solution to this?

ADD REPLY • link 2.5 years ago by sadikshaadhikari1 • 0

Login before adding your answer.