Software to group full length 16S rRNA sequences based on identity threshold
1
0
Entering edit mode
5.8 years ago
fhsantanna ▴ 610

I have a set of full length 16S rRNA sequences (~1500 nts) and I want to classify them in groups based on a identity threshold (98.5%, species circumscription threshold). What software do you recommend for this purpose?

PS1: I tried CD-HIT, but it resulted in some awkward groupings (ie. a species sequences were divided in different groups).

PS2: I have computed an identity matrix based on the alignment of 16S rRNA sequences. I realized that I could perform a hierarchical clustering and detect "islands" of values above 98.5% along the diagonal. Basically, the number of squares along the diagonal would represent the number of species present in my dataset. However, I am not sure if this is an appropriate approach.

rRNA 16S identity matrix species • 1.4k views
ADD COMMENT
1
Entering edit mode

I think you can try using QIIME, pick_OTUs.py script, where you can change Sequence similarity threshold from 0.97 (which is default) to 0.985.

ADD REPLY
1
Entering edit mode
5.8 years ago
Carambakaracho ★ 3.2k

Robert Edgar wrote uparse for OTU clustering. It's not exactly open source but free, provided you get along with the 32 bit version. To be fair, I never used it so far but colleagues applied it successfully to your kind of question.

EDIT: I knew I read about this in qiime2 as well: Clustering sequences into OTUs using q2-vsearch

EDIT2: Nguyen et al on limitations of OTU clustering

ADD COMMENT

Login before adding your answer.

Traffic: 1945 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6