Biostar Beta. Not for public use.
[Ncbi Entrez] Retrieving Complete Genome Informations From Ncbi Genome
0
Entering edit mode
6.1 years ago

Hello All,

I'm trying to build the correct Entrez query in order to get the informations for complete eukaryotic genomes from the NCBI Genome database. The genome browser (http://www.ncbi.nlm.nih.gov/genome/browse/) displays 185 entries when searching complete eukaryotic genomes.

I've been trying these :

  • eukaryota[organism] AND complete[status] ; entries count = 319
  • eukaryota[organism] AND complete[status] AND "genome sequencing"[Project Type] ; count = 300

Any ideas on either the best query to do what I want or which query corresponds to what is displayed in the browser ?

Thanks a lot !

ADD COMMENTlink
0
Entering edit mode

Hello!

What kind of information do you want exactly?

Just the number of complete genomes?

ADD REPLYlink
0
Entering edit mode

No, I was trying to reproduce the genome browser output for complete eukaryotic genomes, using Entrez. That's why I started comparing the numbers of complete genomes, to see if my queries were corrects. Actually I want to get the informations like assembly ID, taxon ID, number of loci, % GC etc… for all complete eukaryotic genomes using BioPerl and Entrez. The problem is, if what I get through Entrez queries is different from genome browser's informations, which one do I choose ? And is there a query that would give the same output ?

ADD REPLYlink
2
Entering edit mode
22 months ago
Neilfws 48k
Sydney, Australia

I'm not convinced that the data on that page can be retrieved via Entrez.

If you follow the link to the FTP site and download the file _eukaryotes.txt_ , you'll see a field named _Status_. This is where the value of 185 comes from - I opened this file in R:

euk <- read.table("eukaryotes.txt", header = T, sep = "\t", stringsAsFactors = F, comment.char = "", quote = "")
table(euk$Status)

#         Chromosomes              No data Scaffolds or contigs 
#                 185                 1609                  722 
#       SRA or Traces 
#                 455

However, if you experiment with the Advanced query builder at the NCBI website, you'll find that:

  • database Genome has field Status, but "chromosomes" is not a valid value
  • databases Bioproject and Assembly do not have field Status

So it may be that there is no direct relation to the Entrez databases. Or I may be wrong and it's just very difficult to formulate the query :)

ADD COMMENTlink
0
Entering edit mode

That's right but it feels weird that NCBI doesn't use the content of its databases to generate this file... I started using that file, since it already contains most of the informations I need. It's just, I'm not very comfortable with working on it while not knowing how its generated and if it corresponds or not to NCBI databases content.

Edit : An interesting fact is that "eukaryota"[organism] gives me like 2100 lines and the eukaryotes section in genome browser is more like 2900…

ADD REPLYlink
0
Entering edit mode
6.1 years ago

Try to search from http://www.ncbi.nlm.nih.gov/genome/ page with the following request

"has chromosome"[properties]) and "map"[project type]

This query specifies values of fields 'Properties' and 'Project type' as in the Advanced query builder at the NCBI website. If you wish to try different options, go to http://www.ncbi.nlm.nih.gov/genome/advanced select a field type and use Show index list link to get a list of possible field values.

ADD COMMENTlink
0
Entering edit mode

But that gives 32 results; we're looking for 185. And you did not specify eukaryota.

ADD REPLYlink
0
Entering edit mode

Then try eukaryota[organism] AND complete[status] AND "has chromosome"[properties]?

ADD REPLYlink
0
Entering edit mode

That gives 128. Is no-one trying before posting :)

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1