I'm new to bioinformatics and such so I'm really consumed about this bit. I did a de novo assembly using Trinity, and now I'm trying to measure the number of transcripts that appear to be full-length or near full-length (as recommended by the Trinity pipeline). After running blastx, i get my blast output in tabular format (outfit 6) and I've ran the Trinity script analyze_blastPlus_topHit_coverage.pl. However, this script gives me the single best matching Trinity transcript for each top matching database entry.
What I want to do is calculate the number of Trinity Transcripts (rather than matching database entry) at a given coverage. For example out of 300,000 contigs, 20,000 100% coverage, and then about 60,000 have a coverage between 100-90% etc.
Would I just bin these according to the percentage identities? If not, how would I proceed to get my desired outcome?
Any help would be great :)