What is the optimal read length filter for Kallisto post adapter removal?
0
1
Entering edit mode
5.8 years ago
bipin ▴ 30

I am using 151bp paired end RNA seq reads to study differentially expressed genes between two conditions. The reads are aligned to a reference transcriptome using Kallisto(index created using default kmer size of 31).

However ~16% of the reads have an adapter contamination with the adapter sequence starting in the middle of the read in some cases. The fastqc plot for adapter contamination look like this

I am using trim-galore to remove the adapter contamination however I am unsure as to what min length cutoff post adapter removal I should keep to optimize between preventing multimapping and losing reads.

I tested with 50 bp which results in loss of ~100000 read pairs(0.5%) and adds/removes ~30 genes from the significant list from DESeq2(total significant genes ~2400).

Kallisto works fine without the adapter removal too but I suspect it might result in spurious multimapping for reads which have very small >31 & <50 bp non adapter portion.

What would be an optimal read length cutoff in this scenario or how can I figure out the cutoff in this case?

RNA-Seq kallisto • 2.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 2800 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6