Interpreting de novo contig output
0
0
Entering edit mode
7.7 years ago
skbrimer ▴ 740

Afternoon everyone,

I am doing some viral RNA-seq work and I got back bacterial rRNA contigs (e.coli and salmonella) when I used SPAdes. So how do I interpret this? I expected avian rRNA since it is the host but the bacterial ones are unexpected. Does this mean I have two different bacterial contaminates or does this mean I have two smaller contigs of avian rRNA that have a lot of similarity to these bacteria?

To do the target list I used the ncbi command line tools and requested just the top match back. So when I find these two sequences I blast'ed them individually thinking I would find it matching to a lot of different rRNAs but only e.coli and salmonella were returned.

Any advice would be appreciated.

Thanks Sean

de novo RNA-Seq • 1.5k views
ADD COMMENT
0
Entering edit mode

Assuming that these are contaminants how about taking the reads for those two out (BBSplit) and then re-doing SPAdes on the rest? Chances of avian rRNA having similarity to bacterial rRNA seem non-convincing.

ADD REPLY
0
Entering edit mode

I also get back human and mouse rRNA hits as well, but when I blast those avian is in the mix too so that is why I was asking. Is eukaryote and prokaryote just that different and you can get close blast matches from mammal to mammal?

ADD REPLY
0
Entering edit mode

You are doing viral RNAseq from an avian (origin?) sample? You used SPAdes to assemble the reads. All contigs generated from SPAdes were then blasted to get the top hit. There was no ribo-depletion/poly-A selection done for these samples? Is that right?

What is the question you are trying to answer?

Pro- and Eu-karyotic rRNA's are different.

ADD REPLY
0
Entering edit mode

sorry, I ramble.

Yes, I am doing viral RNASeq from avian origin sample. It is not ribo depleted (still working on baits, no commercial kits work with avian) and not polyA depleted because the virus is polyA'ed. SPAdes generated contigs were blasted to get the top hit. That is all correct.

My question was, do I really have bacterial contamination in my sample or not?

I only have two contigs that contain bacterial information, one 1735nt long and 29x coverage for ecoli 23s rRNA (seems like a good contig) and one that is 289nt ~3x coverage for 23s rRNA of salmonella.

The reason I ask is because biologically finding these two bacteria in the sample doesn't make sense. If they were contaminated they would be the dominate sample. These samples are allantoic fluids from chicken eggs, bacteria will grow like crazy in that medium.

My follow rambling/poorly worded question wa about how I was curious if eukaryotic rRNA was similar enough to prokaryotic rRNA to cause a false match?

The reason I was curious was because also in my dataset the blast results include hits from sea snail, atlantic cod, cow, mouse, and human. I imagine if I blast the contigs by themselves I will find that avian is in the results as well, just not the highest hit.

Sorry for the confusion

ADD REPLY
1
Entering edit mode

What % of the hits (if you checked all contigs) are to rRNA of some kind?

So the sequence of interest is only the virus and nothing else? If yes, then how about binning all avian sequence away (assume an avian reference is available) using BBSplit so hopefully only the sequence of interest will go into assembly. I am not sure if SPAdes is the best program for this assembly (if you don't expect splicing then perhaps so) but trinity may also be worth looking into. If a related viral genome is available then trying to align (or even bin reads along with avian genome) may be worth looking into.

As for why those two bacterial contigs are present, your guess is as good as mine. They may represent minor contamination somewhere during prep?

ADD REPLY
0
Entering edit mode

Those are some good points. I will give them a try. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1944 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6