Bacteria identification using 16s rRNA sequencing data
4
0
Entering edit mode
8.2 years ago
fxiao1 • 0

I would like to identify as many bacteria as possible from the 16s rRNA sequencing data. I found more than 60% of the reads can be aligned to multiple bacteria species. I don't think I should ignore them. I try to assign them to specific species according to the count distribution of the 40% of the reads. Does this make sense? Is there any protocol to follow in this field? Thanks.

RNA-Seq alignment • 3.5k views
ADD COMMENT
1
Entering edit mode

NB. Tag should be amplicon-seq or 16S, not RNA seq.

ADD REPLY
4
Entering edit mode
8.2 years ago
jb ▴ 40

What you don't describe is what region of the 16S gene you amplified. Your ability to discriminate "different" representative sequences at different taxonomic levels depends upon the region sequenced. You should use highly curated databases to determine what you have (such as what is found associated with MOTHUR and /or qiime - which are software specifically designed for the purpose of analyzing 16S sequences). Mis-identification can arise from errors/missing data from your sequences as well as errors/missing data in the database you are using. Most likely those that hit to multiple species won't be able to discriminate at the species taxonomy, but at a higher level - like family. .

ADD COMMENT
2
Entering edit mode
8.2 years ago
Daniel ★ 4.0k

Check out the qiime tutorial, it's pretty thorough and should allow you to do everything you need. Notably, you typically don't view the data at species level as this is very varied, but at the genus or family, which is built in to this kind of analysis.

http://qiime.org/tutorials/tutorial.html

ADD COMMENT
2
Entering edit mode
8.2 years ago
dago ★ 2.8k

I think that for 16S study there is not better option than the resources offered by SILVA. They rely on a manually curated database that has been extensively used and considered to be a golden standard for bacterial phylogeny.

Hope it helps

ADD COMMENT
0
Entering edit mode
8.2 years ago

Here is how the MEGAN tool does it:

http://ab.inf.uni-tuebingen.de/data/software/megan5/download/manual.pdf

The main problem addressed by MEGAN is to compute a “species profile” by assigning the reads from a metagenomics sequencing experiment to appropriate taxa in the NCBI taxonomy. At present, this program implements the following naive approach to this problem:

  1. Compare a given set of DNA reads to a database of known sequences, such as NCBI-NR or NCBI-NT [3], using a sequence comparison tool such as BLAST [1].
  2. Process this data to determine all hits of taxa by reads.
  3. For each read r, let H be the set of all taxa that r hits.
  4. Find the lowest node v in the NCBI taxonomy that encompasses the set of hit taxa H and assign the read r to the taxon represented by v. We call this the naive LCA-assignment algorithm (LCA = “lowest common ancestor”). In this approach, every read is assigned to some taxon.
ADD COMMENT

Login before adding your answer.

Traffic: 2021 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6