CD-HIT can not complete remove redundant transcript
1
0
Entering edit mode
6.6 years ago
xioli2013 ▴ 10

I tried to use CD-HIT-EST to remove redundancy in trinity de novo transcripts however I can still see redundant annotations,

cd-hit-est -i trinity.fasta -o clstr_out -c 0.9 -n 9

for example:

TRINITY_DN1855_c5_g1, TRINITY_DN1855_c1_g1

all pointing to dnaK, the two sequences aligned at 92% identity but they are not clustered by CD-HIT

>TRINITY_DN1855_c5_g1 
CGCCAAGAAGACCGAGATCTACAGCACCGCCGAAAACAACCAGCCCGGTGTGGAAATCAACGTGCTGCAAGGCAAGCGCC
CCATGGCCGCCGACAACAGGTCCCTGGGCCGCTTCAAGCTCGAGGGCATTCCCCCCATGCCCGCAGGCTGCGCCCAGATC
GAAGTGACCTTCGGTATCGACGCCAACGGCATTCTGCATGTCACCGCCAAGGAAAAGACCAGCAGCAAGGAAAGCAGCAT
CCGCATCGGGAACACCACCACCCTCGACAAGAGTGACGTGGAGCGCATGGTGCAGGAAACCGAGCAGAACGCCGCCGCCG
ACAGGGCCCGCAAGGAGAAGGTCGAGAAACGCAACAACCTCGACTCGCTGCGC
> TRINITY_DN1855_c1_g1
AGGGCGGCATGATTGCCCCGATGGTTACCCGCAACACCACCGTGCCCGTCAAGAAGACCGAGATCTACACCACTGCCGAAAA
CAACCAGCCCGGCGTGAAAATCAACGTGCTGCAAGGCGAGCACCCCATGGCCGCCGACAACAAGTCTCTGGGCCGCTTCAAGCTCGAAGGCGTTCCCCCCATGCCCGCAGGCCGCGTCCAGATCGAAGTGACCTTCGATAT

Trying other parameters as -c 0.89, 0.88 did not reduce the redundancy but actually increased the number of transcripts.

I am writing to hear your comments as to what the problem is and how to address the issue

Thanks,

Xp

Trinity CD-HIT-EST • 2.5k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
1
Entering edit mode
6.6 years ago

There are inherent limitations to CD-HIT algorithm which we should be aware of. Please see this link

Also, see CD-HIT-2D comparing algorithm on the same page

ADD COMMENT

Login before adding your answer.

Traffic: 1959 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6