Cluster RNA sequences from fasta alignment by identity threshold
1
0
Entering edit mode
6.0 years ago

Hello everyone,

I have big alignments of RNA sequences (16-200 thousand sequences) and I need to cluster them by an identity threshold. Basically what I want to do is: - count identity of sequences distribution for these alignments, - after discovering this distribution, I would like to cluster these sequences by an identity threshold for example create file, with sequences from my current alignment with sequences that are identical at least at 50% and more, 60% and more ... and so on;

Just to clarify, I consider identity of sequence as number of positions that their nuclotides are identical for exaple:

seq1 ATA seq2 GTG seq1 and seq2 are identical at 33,3%.

My question is do You know any software or method that would help me to solve that issues?

Thank You all for reading this and for possible answers in advance.

sequences identity RNA clustering RNA alignment • 1.4k views
ADD COMMENT
0
Entering edit mode

clumpify.sh from BBMap suite will allow you to clump the duplicate sequences together (Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates. ). I don't recollect software that would allow you to do what you are asking for in an iterative way.

BTW: What are you trying to achieve? Perhaps there is a different way to do it.

Reason to do this is described in this SA question: http://seqanswers.com/forums/showthread.php?t=82133

ADD REPLY
0
Entering edit mode

Hello filip7grudzien!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=82132

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
0
Entering edit mode

Thank You, I added a link to my post from here in seqanswers not to duplicate content, I just wanted to higher a chance that somebody could help me, sorry for an inconvenience.

ADD REPLY
0
Entering edit mode
5.8 years ago
Joe 21k

CD-HIT is designed for exactly this.

ADD COMMENT

Login before adding your answer.

Traffic: 1621 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6