low mapping rate, finding possible source of contamination
0
0
Entering edit mode
6.6 years ago
vaslanzadeh ▴ 20

Hello, I have received small RNA-seq data from a pathogenic bacteria for bioinformatic analysis. After trimming the adaptors with trim_galore, fastq file was mapped (almost 1 million reads) to the reference genome using bowtie and got overall 26% mapping rate (unique + multiple mapped). Mapping rate dose not change much even if I allow two mismatches. For negative control, reads were also mapped to mouse genome which again gives 25% mapping rate, similar to what I get when I align to the bacterial genome. Most of these mapped reads map to rRNA ans tRNAs, this is why mapping to bacterial genome and mouse genome gives similar results. Now, I do not know what are those 75% unmapped reads. It is possible that there was contamination(s) during library preparation, etc. How can I find the source of contamination? Is there a way to BLAST unmapped reads to find out which genome/strain they are probably coming from?

Thanks

RNA-Seq genome alignment blast • 1.9k views
ADD COMMENT
0
Entering edit mode

There are a lot of possibilities, with contamination being just one; others include incomplete adapter-trimming. Sometimes fastQC is helpful in this kind of situation (bowtie cannot map low-quality reads, for example); sometimes, using a different aligner helps, and sometimes BLASTing for contaminant organisms is useful. But for example, "After trimming the adaptors with trim_galore" is not informative - you need to describe the command used, the results, and perhaps the length distribution afterward.

ADD REPLY

Login before adding your answer.

Traffic: 3349 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6