Selecting primary transcript for each locus from a GFF file
2
0
Entering edit mode
8.6 years ago
arnstrm ★ 1.8k

Hello,

I am planning to use Maker predicted genes for identifying orthologs among the closely related species. But Maker has predicted multiple transcripts for each locus (because of multiple gene predictors that were used in Maker as well as multiple isoforms for the genes). Although, I am using only predictions with AED scores <1.0, I still have many models for each locus. My question is, what is the best way to chose a transcript for a region? Should I select the longest coding sequence for that region? Are there any program that can perform this step?

Thanks for any help!

orthologs annotations gff predictions • 3.7k views
ADD COMMENT
2
Entering edit mode
8.6 years ago
h.mon 35k

You could use EvidenceModeler to get a consensus prediction.

Another approach could be clustering orthologs using all predicted genes, then prune the clusters using some criterion (longest transcript is not necessarily the best). Agalma pipeline uses this later approach, though the paper do not details how this is performed (and Agalma is designed for primarily to RNAseq data sets).

ADD COMMENT
1
Entering edit mode
8.6 years ago

Many genes will have multiple splice variants with identical CDSs, so that strategy might not be sufficient. You could use the mostly strongly expressed RNA (sort by gene, then gene expression, and filter for the best using awk or Excel), but that will frequently differ between tissues. In summary, there's a good reason why multiple transcripts exist; there is not one "best" transcript. That being said, most transcript variants will be substantially similar, so if you arbitrarily choose among mRNAs with similar evidence (functional genomics/transcriptomic data), you will be able to identify orthologs from the common regions of each transcript.

ADD COMMENT

Login before adding your answer.

Traffic: 2036 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6