are all gff file provided by NCBI clearly annotated with gene and protein sequence start and end information?
1
0
Entering edit mode
5.9 years ago
Yingzi Zhang ▴ 90

Hi there. I am a novice. I need to do comparative genomics. I searched and downloaded several genomes of different genus below one species. Because NCBI only choose one of them served as Refseq. I can only get the annotation file of the Refseq one. And, I can only find the assembled genomes of the rest others with very poor quality annotation files.(just keep "genbank region", "genbank region" and "genbank region", no "exon", "CDS", for example) However, the literatures which they derived from have described and discussed the annotation results. So is it because that I try to search the annotation file in a wrong way? Or I have to ask the authors for annotation files? Actually, I have emailed the authors asking for annotation files. If there is no answer. Does it mean that I have to repeat the annotation work or ortholog analysis that the authors have already done before? Thank you so much if there is an answer or some suggestions.

Assembly genome sequence gene • 1.5k views
ADD COMMENT
0
Entering edit mode
5.9 years ago
Carambakaracho ★ 3.2k

Welcome to the world of NCBI genomes. There is no definite quality standard as far as I know, however, there is the "RefSeq category" (see NCBI assembly help page, somewhere halfway trough) You might be luckiest with reference or representative genomes, but there's no guarantee on the annotation of representative genomes as they're most often chosen computationally. There are at least three ways you can get the representative information:

ADD COMMENT
0
Entering edit mode

Thank you for telling me the information of the "RefSeq category" and ways to get the information!!! I need look into pigs. There is a RefSeq, which I am clearly confirm. However, as you say, annotation of the other pig genomes seem chosen computationally. (after I download the gff files from the Genbank, the contents look nonsense, with very poor and rough information.) Some papers told that they have finished annotating the genomes and also discussed much about their details but there seems no data accessibility to the annotation files. So I need to re-annotate the genomes right? For I am not very clear that whether all the annotation files are required to be uploaded. :)

ADD REPLY
0
Entering edit mode

For anything but bacteria I recommend to look at ensembl genomes as well, for example the pig genome there: Sus scrofa When you know there were groups with better annotation, the ensembl is more likely to integrate such annotation for many genomes, with exception of JGI genomes in my experience. It's always worth double checking both resources.

ADD REPLY
0
Entering edit mode

i got it. thank you!

ADD REPLY

Login before adding your answer.

Traffic: 1443 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6