Assembling long similar contigs
2
0
Entering edit mode
7.4 years ago
dcdanko • 0

I have a program which outputs ~50 assembled transcripts which are about 10k base pairs long each.

My program already filters exact duplicate sequences but many of the assembled transcripts are very similar to one another.

Is there any existing assembly program which can connect sequences that are identical over 90% of their length?

Assembly RNA-Seq • 1.4k views
ADD COMMENT
1
Entering edit mode

Have you tried CD-HIT? It can be used for clustering and comparing protein or nucleotide sequences.

ADD REPLY
0
Entering edit mode
7.4 years ago

Dedupe from the BBMap package can remove similar sequences to leave only a single copy.

dedupe.sh in=transcripts.fa out=deduped.fa minidentity=0.9 maxedits=20

ADD COMMENT
0
Entering edit mode
7.4 years ago
arnstrm ★ 1.8k

Also, another alternative: TACO

ADD COMMENT

Login before adding your answer.

Traffic: 2453 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6