Biostar Beta. Not for public use.
Software to group full length 16S rRNA sequences based on identity threshold
0
Entering edit mode
12 months ago
fhsantanna • 440
Brazil

I have a set of full length 16S rRNA sequences (~1500 nts) and I want to classify them in groups based on a identity threshold (98.5%, species circumscription threshold). What software do you recommend for this purpose?

PS1: I tried CD-HIT, but it resulted in some awkward groupings (ie. a species sequences were divided in different groups).

PS2: I have computed an identity matrix based on the alignment of 16S rRNA sequences. I realized that I could perform a hierarchical clustering and detect "islands" of values above 98.5% along the diagonal. Basically, the number of squares along the diagonal would represent the number of species present in my dataset. However, I am not sure if this is an appropriate approach.

ADD COMMENTlink
1
Entering edit mode

I think you can try using QIIME, pick_OTUs.py script, where you can change Sequence similarity threshold from 0.97 (which is default) to 0.985.

ADD REPLYlink
0
Entering edit mode
12 months ago
Carambakaracho ♦ 1.2k
Switzerland/Basel

Robert Edgar wrote uparse for OTU clustering. It's not exactly open source but free, provided you get along with the 32 bit version. To be fair, I never used it so far but colleagues applied it successfully to your kind of question.

EDIT: I knew I read about this in qiime2 as well: Clustering sequences into OTUs using q2-vsearch

EDIT2: Nguyen et al on limitations of OTU clustering

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1