Are exons within a .gtf file experimentally verified or computationally predicted?
1
0
Entering edit mode
9.4 years ago
NHEJ ▴ 360

I am curious as to how a big data file like a gene transfer format file is actually made. For example, are exons within a .gtf file experimentally verified or bioinformatically predicted or some sort of mix of the two?

exon gtf • 3.1k views
ADD COMMENT
2
Entering edit mode
9.4 years ago

It depends on the organism. Some highly-studied genomes, like human, have many experimentally verified RNAs and proteins. Many, or most, organisms studied with much smaller budgets have only computationally-predicted annotation; certainly, if there is only money for a single DNA-sequencing run and nothing else, everything will be predicted.

That said, computation and prediction play major roles even in genes that are "experimentally verified", as the assembly of RNA-seq data, or guessing where a protein fragment of a certain mass-spec-determined mass originated, are still to some extent computational guesses, as is the genome itself - but the more information you have, the more accurate the guess is. So genes that you can find proteins for, and RNAs for, and also predict from the genome alone, are great. But if a gene is in a long open reading frame with strong statistical indications that it is a real gene, and is possibly homologous to a similar gene verified in related species, but you can't find a protein or RNA for it... maybe it just isn't expressed in the conditions or tissue types you studied. In other words, if you don't use genes purely predicted from DNA, you definitely are going to miss some.

I think that for well-studied organisms the gene annotations are typically a mix of things with various degrees of validation. But for human, for example, there are multiple different gff/gtf files that you can download, with contradictory information; they're curated in different ways, determined ad-hoc by different individuals. There are no guarantees for arbitrary gtf files.

ADD COMMENT
0
Entering edit mode

Thanks for the very interesting reply. Do you by chance know of any published sources highlighting this information that you can point me to?

ADD REPLY
0
Entering edit mode

Sorry, no; this is just my personal observation.

ADD REPLY
0
Entering edit mode

Just to clarify, when you say annotated, do you mean experimentally confirmed, or can computationally predicted things also qualify as annotated?

ADD REPLY
1
Entering edit mode

"Annotated" just means "having descriptions". So a fasta file alone is not annotated. If there is an accompanying gff/gtf file, or other indication of which genomic regions have what meanings, then it is annotated. The annotation can be simple and just include gene boundaries, or it can also include information about predicted or known gene function, alternate isoforms, and so forth. But in general, anything in addition to the fasta file that describes meaning of the nucleotide sequences is annotation, so as long as something like that is available, the genome is annotated.

ADD REPLY

Login before adding your answer.

Traffic: 1363 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6