Retrieve title name by accession R vs Python
0
1
Entering edit mode
5.6 years ago
Medhat 9.7k

I was working with R trying to get organism name (description):

> handel <- entrez_fetch(db = "nucleotide", id = "NC_001479.1", rettype = "xml") 
# used also entre_search

With fetch I get empty list with search I get an ID

When I use Python.

handle = Entrez.efetch(db="nucleotide", id="NC_001479.1", rettype="gb", retmode="text")
x = SeqIO.read(handle, 'genbank')  
print(x.Description)

#result:  'Encephalomyocarditis virus, complete genome'

What I am doing wrong with R?!

Thanks


Ok , I think I need to use also rettype = "gb" with R then process the results.

At the end I wrote this function:

retrive_title <- function(gi){
  handel <- entrez_fetch(db = "nucleotide", id = gi, rettype = "gb", retmode = "xml")  
  xml_handel <- read_xml(handel)
  xml_text(xml_find_all(xml_handel, "//GBSeq_organism"))
}

example:
retrive_title("NC_023021.1")
[1] "Formica exsecta virus 1"

R Python Entrez • 1.6k views
ADD COMMENT
0
Entering edit mode

Is this question resolved?

ADD REPLY
0
Entering edit mode

Not yet, I still need to find out how to extract the name from the result, or maybe there is a better solution than the one I wrote.

ADD REPLY
1
Entering edit mode

Using NCBI unix utils:

$ efetch -db nuccore -id "NC_001479.1" -format docsum | xtract -pattern DocumentSummary -element Organism
Encephalomyocarditis virus

$  efetch -db nuccore -id "NC_001479.1" -format docsum | xtract -pattern DocumentSummary -element Title
Encephalomyocarditis virus, complete genome

What you are asking for is under heading Title not Organism. Difference demonstrated above.

ADD REPLY
0
Entering edit mode

Thanks I will try to apply that on Python or R

ADD REPLY
0
Entering edit mode

unfortunately, it did not fix the main issue: using

> handel <- entrez_fetch(db = "nucleotide", id = "NC_001479.1", rettype = "xml")

gives empty list using

> handel <- entrez_fetch(db = "nucleotide", id = "NC_001479.1", rettype = "gb")

give a bulk string that is not easy to handle (away from if it is title or organism)

ADD REPLY
1
Entering edit mode

If you are going to use xml then:

$ efetch -db nuccore -id "NC_001479.1" -format xml | xtract -pattern OrgName -element OrgName_name_virus
Encephalomyocarditis virus

$ efetch -db nuccore -id "NC_001479.1" -format xml | xtract -pattern Bioseq_descr -element Seqdesc_title
Encephalomyocarditis virus, complete genome
hypothetical protein EMCVgp1 [Encephalomyocarditis virus]
protein 1A [Encephalomyocarditis virus]
protein 1B [Encephalomyocarditis virus]
protein 1C [Encephalomyocarditis virus]
protein 1D [Encephalomyocarditis virus]
protein 2A [Encephalomyocarditis virus]
protein 2B [Encephalomyocarditis virus]
protein 2C [Encephalomyocarditis virus]
protein 3AB [Encephalomyocarditis virus]
protein 3C [Encephalomyocarditis virus]
protein 3D [Encephalomyocarditis virus]
truncated polyprotein [Encephalomyocarditis virus]
2B* protein [Encephalomyocarditis virus]

With gb

$ efetch -db nuccore -id "NC_001479.1" -format gb | grep DEFINITION
DEFINITION  Encephalomyocarditis virus, complete genome
ADD REPLY
0
Entering edit mode

because I am In R I used the following (feel free to advice me with something better):

handel <- entrez_fetch(db = "nucleotide", id = "NC_001479.1", rettype = "gb")
x -> read.xml(handle)
xml_text(xml_find_all(x, "//GBSeq_organism"))

The output will be: [1] "Encephalomyocarditis virus"

I think there should be something easier than this

ADD REPLY
0
Entering edit mode
$ wget -q -O - "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NC_001479.1&retmode=xml" | xmllint --xpath '/GBSet/GBSeq/GBSeq_definition/text()'  -

Encephalomyocarditis virus, complete genome
ADD REPLY

Login before adding your answer.

Traffic: 2455 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6