Dealing With Sequence Over Representation In Microrna Rna-Seq Data
2
0
Entering edit mode
11.2 years ago
Sudeep ★ 1.7k

Hi all,

I am working on a set of microRNA rna-seq data. One strange problem that we have noticed while checking the data quality with FastQC is that a large portion of the reads in all samples (roughly 40% to 60%) in all our samples are duplicates of just one read (it comes to around roughly 2-4 million reads in all samples). FastQC tags this sequence as a possible PCR primer. We tried to BLAST this sequence to miRBase (after removing the adapter), but couldn't find a matching microRNA. My colleagues are suggesting that this could be biological, but I am not convinced. So my questions are assuming that FastQC tagging of this read as a PCR primer is a false positive, could it be possible that one microRNA is dominant in all the sequenced samples? and how can we confirm whether it is biological or a problem during sequencing ?

Thank you


UPDATE:

We contacted the folks who sequenced our samples (done externally) with the problem I mentioned. After some checking (I don't know the details yet), they informed us that it was an error in library preparation/sequencing step, and agreed to re-sequence our samples. So, thank you all for taking interest.

rna-seq fastqc • 3.4k views
ADD COMMENT
1
Entering edit mode

also my own 2 cents - a life scientist is usually like Fox Mulder from the X-Files his motto was I want to believe. As a bioinformatician I feel I am Dana Scully who always skeptical.

ADD REPLY
0
Entering edit mode

i just have to see this read

ADD REPLY
0
Entering edit mode

that right here, make a new question put your read there and here is a title for it: All my data looks alike. Help me decide: is it a new insight or just a bad run?

ADD REPLY
1
Entering edit mode
11.2 years ago

As a first step, I would suggest to also perform a BLAST search of the NCBI nucleotide database in order to identify any other potential source of the sequence. I think there are several possible sources of "contamination" during the preparation of a smallRNA-Seq library (I have personally seen fragments of rRNA which were amplified during the first PCR amplification of the small RNAs which had been size fractionated prior to amplification). The fact that this one sequence so highly abundant indicates that it is a PCR artifact.

A further question is: Does FastQC identify the primer sequence? It should do so, as it has uses a list of oligos as reference to e.g. name the different Illumina adaptors and primers.

HTH

ADD COMMENT
0
Entering edit mode

Actually we did BLAST on NCBI nucleotide database, but the results were pretty un-conclusive, a lot of hits with very high e-values. And yes, without the adapters trimmed, FastQC identified the primer sequence, but when the adapters were trimmed it did not.

ADD REPLY
1
Entering edit mode
11.2 years ago

Don't forget to Google this sequence. I had an experience where MegaBLAST failed to identify a tRNA sequence with a post-processed end whereas Google found it mentioned in a paper.

ADD COMMENT
0
Entering edit mode

Never thought of that, Thank you.

ADD REPLY

Login before adding your answer.

Traffic: 2532 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6