kmer content changes after trimming and removing adapters from reads
0
0
Entering edit mode
5.8 years ago
serpalma.v ▴ 80

Dear community

I have a large set of FASTQ files from genomic DNA. I ran them through FastQC and found that the modules "overrepresented sequences" and "Kmer content" failed. The rest of the modules did not fail, except a warning in "Per tile sequence". Such pattern was present in almost all FASTQ files (>1000 files).

The "overrepresented sequences" module pointed out the presence of TruSeq adapters and Illumina PCR Primer 1.

I ran them through Trimmomatic to remove adapters. The module "overrepresented sequences" was fixed, but "Kmer content" failed again, only this time the pattern was different. Moreover, I get a new warning for the "Per sequence GC content" module (please see linked figure).

I have read that this pattern in "Kmer content" before trimming (kmers found at the beginning of the reads) could be due to fragmentation bias.

I worked with the adapter file provided by Trimmomatic (TruSeq3-PE-2.fa)

This are the flags I used for trimmomatic:

java -jar trimmomatic-0.38.jar PE -phred33 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

I have two questions:

  • Is the "kmer content" and "Per sequence GC content" profiles after trimming something to worry about?

  • What could be a possible reason for the change in "kmer content" after trimming?

Here you can find the FastQC reports before and after running Trimmomatic:

https://drive.google.com/open?id=1vLY0FsXxnzJYT7d4X1TWZy96cSXu3XGs

https://drive.google.com/open?id=1Tk0GCy_SEz8ZrP2Y_3f_XYs1cnN11ScU

And here is a comparison of "kmer content" and "Per sequence GC content" before and after trimming:

https://drive.google.com/open?id=1YT6zbmKU_3DYlrTX_BLkMBOpGnmqg1Z7

Thank you very much in advance

DNAseq trimmomatic fastqc • 2.6k views
ADD COMMENT
0
Entering edit mode

Failing k-mer content and GC content in FastQC generally has no immediate adverse effect on your analysis. You should proceed with further analysis and see what you get. In latest FastQC k-mer analysis tool has been turned off by default since it causes more heartaches than necessary.

ADD REPLY

Login before adding your answer.

Traffic: 2559 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6