Biostar Beta. Not for public use.
Trinity transcriptome quality assessment
Entering edit mode
16 months ago

I have 8 reps of illumina paired-end reads from a fungal RNA-seq experiment that I have de novo assembled using trinity. Trinity says to use the --jaccard_clip function if you predict high gene density, which may be the case for a small fungal genome.

I assembled the transcriptome twice. Once with --jaccard clip and once without --jaccard_clip and preformed a couple of the recommended quality assessment steps for each.

Read content in each of the transcriptomes is good. 99.12% of reads map to the transcriptome with --jaccard_clip transcriptome, and 98.97% to the transcriptome without --jaccard_clip.

Below, stats for each transcriptome generated using

With --Jaccard_clip

Counts of transcripts, etc.
Total trinity 'genes': 18674
Total trinity transcripts: 30205
Percent GC: 62.50

Stats based on ALL transcript contigs:
Contig N50: 2481
Median contig length: 674
Average contig: 1296.03
Total assembled bases: 39146732

Stats based on ONLY LONGEST ISOFORM per 'GENE':
Contig N50: 2336
Median contig length: 363
Average contig: 1034.59
Total assembled bases: 19319873

Without --jaccard_clip

Counts of transcripts, etc.
Total trinity 'genes': 6106
Total trinity transcripts: 18773
Percent GC: 62.38

Stats based on ALL transcript contigs:
Contig N50: 4196
Median contig length: 2074
Average contig: 2720.12
Total assembled bases: 51064844

Stats based on ONLY LONGEST ISOFORM per 'GENE':
Contig N50: 3986
Median contig length: 1973.5
Average contig: 2514.36
Total assembled bases: 15352683

Trinity also recommend counting full length transcripts with BLAST to swissprot. Below transcripts were aligned to their best protein hit. The chart displays number of transcripts at various percent coverages. For more info on this chart, see

With --Jaccard_clip

hit_pct_cov_bin    count_in_bin     >bin_below
100   1046   1046
90  571 1617
80  404 2021
70  331 2352
60  327 2679
50  330 3009
40  305 3314
30  231 3545
20  281 3826
10  120 3946

Without --jaccard_clip

hit_pct_cov_bin   count_in_bin     >bin_below
100    2313  2313
90  764 3077
80  552 3629
70  440 4069
60  381 4450
50  333 4783
40  314 5097
30  269 5366
20  220 5586
10  102 5688

I would like to choose the better of these transcriptomes for my analysis, but Im still not sure which is the most representative. Does anyone have advice about how to make the final selection?

Assembly RNA-Seq • 836 views
Entering edit mode
15 months ago
Chris Fields ♦ 2.1k
University of Illinois Urbana-Champaign

cwbenson1993 have you tried any of the other assessment recommendations from the Trinity docs?

I recommend running BUSCO for both, but the others are well worth checking into (TransRate as well).

Entering edit mode

Hey Chris, thanks!

I tried BUSCO for the longest gene isoform of both files but didn't get great number for either assembly...

With --jaccard_clip


Without --jaccard_clip

Entering edit mode

It's a transcriptome, so you may not get the complete set of BUSCOs for your taxonomic group (this only represents what is expressed, unlike a genome).

The key is using this to compare various assembly versions (or assemblies using different tools). They both are fairly comparable but the --jaccard-clip is slightly higher. It might be better to run on all the data (not just the longest) using the 'transcriptome' mode if you aren't already doing that; the longest rep sequence may not always be the best.


Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3