Question

Retrieve title name by accession R vs Python

1

Entering edit mode

5.6 years ago

Medhat 9.7k

I was working with R trying to get organism name (description):

> handel <- entrez_fetch(db = "nucleotide", id = "NC_001479.1", rettype = "xml") 
# used also entre_search

With fetch I get empty list with search I get an ID

When I use Python.

handle = Entrez.efetch(db="nucleotide", id="NC_001479.1", rettype="gb", retmode="text")
x = SeqIO.read(handle, 'genbank')  
print(x.Description)

#result:  'Encephalomyocarditis virus, complete genome'

What I am doing wrong with R?!

Thanks

Ok , I think I need to use also rettype = "gb" with R then process the results.

At the end I wrote this function:

retrive_title <- function(gi){
  handel <- entrez_fetch(db = "nucleotide", id = gi, rettype = "gb", retmode = "xml")  
  xml_handel <- read_xml(handel)
  xml_text(xml_find_all(xml_handel, "//GBSeq_organism"))
}

example:
retrive_title("NC_023021.1")
[1] "Formica exsecta virus 1"

R Python Entrez • 1.6k views

ADD COMMENT • link 5.6 years ago by Medhat 9.7k

0

Entering edit mode

Is this question resolved?

ADD REPLY • link 5.6 years ago by GenoMax 141k

0

Entering edit mode

Not yet, I still need to find out how to extract the name from the result, or maybe there is a better solution than the one I wrote.

ADD REPLY • link 5.6 years ago by Medhat 9.7k

1

Entering edit mode

Using NCBI unix utils:

$ efetch -db nuccore -id "NC_001479.1" -format docsum | xtract -pattern DocumentSummary -element Organism
Encephalomyocarditis virus

$  efetch -db nuccore -id "NC_001479.1" -format docsum | xtract -pattern DocumentSummary -element Title
Encephalomyocarditis virus, complete genome

What you are asking for is under heading Title not Organism. Difference demonstrated above.

ADD REPLY • link 5.6 years ago by GenoMax 141k

0

Entering edit mode

Thanks I will try to apply that on Python or R

ADD REPLY • link 5.6 years ago by Medhat 9.7k

0

Entering edit mode

unfortunately, it did not fix the main issue: using

> handel <- entrez_fetch(db = "nucleotide", id = "NC_001479.1", rettype = "xml")

gives empty list using

> handel <- entrez_fetch(db = "nucleotide", id = "NC_001479.1", rettype = "gb")

give a bulk string that is not easy to handle (away from if it is title or organism)

ADD REPLY • link 5.6 years ago by Medhat 9.7k

1

Entering edit mode

If you are going to use xml then:

$ efetch -db nuccore -id "NC_001479.1" -format xml | xtract -pattern OrgName -element OrgName_name_virus
Encephalomyocarditis virus

$ efetch -db nuccore -id "NC_001479.1" -format xml | xtract -pattern Bioseq_descr -element Seqdesc_title
Encephalomyocarditis virus, complete genome
hypothetical protein EMCVgp1 [Encephalomyocarditis virus]
protein 1A [Encephalomyocarditis virus]
protein 1B [Encephalomyocarditis virus]
protein 1C [Encephalomyocarditis virus]
protein 1D [Encephalomyocarditis virus]
protein 2A [Encephalomyocarditis virus]
protein 2B [Encephalomyocarditis virus]
protein 2C [Encephalomyocarditis virus]
protein 3AB [Encephalomyocarditis virus]
protein 3C [Encephalomyocarditis virus]
protein 3D [Encephalomyocarditis virus]
truncated polyprotein [Encephalomyocarditis virus]
2B* protein [Encephalomyocarditis virus]

With gb

$ efetch -db nuccore -id "NC_001479.1" -format gb | grep DEFINITION
DEFINITION  Encephalomyocarditis virus, complete genome

ADD REPLY • link 5.6 years ago by GenoMax 141k

0

Entering edit mode

because I am In R I used the following (feel free to advice me with something better):

handel <- entrez_fetch(db = "nucleotide", id = "NC_001479.1", rettype = "gb")
x -> read.xml(handle)
xml_text(xml_find_all(x, "//GBSeq_organism"))

The output will be: [1] "Encephalomyocarditis virus"

I think there should be something easier than this

ADD REPLY • link 5.6 years ago by Medhat 9.7k

0

Entering edit mode

$ wget -q -O - "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NC_001479.1&retmode=xml" | xmllint --xpath '/GBSet/GBSeq/GBSeq_definition/text()'  -

Encephalomyocarditis virus, complete genome

ADD REPLY • link 5.6 years ago by Pierre Lindenbaum 161k