Different output fields from efetch
1
0
Entering edit mode
5.4 years ago
Medhat 9.7k

If I tried to get organism name using efetch like:

efetch -db nuccore -id "NC_001422.1" -format docsum | xtract -pattern DocumentSummary -element Organism result will be: Escherichia virus phiX174

While If I used :
handle = Entrez.efetch(db="nuccore", id="NC_001422.1", rettype="docsum")

There will not be any Organism element in the output result, Should I used different parameters while using python?
or use Subprocess to run efetch from command line?

like:

 filter_cmd = ['xtract', '-pattern', 'DocumentSummary', '-element', 'Organism']
 info_name_cmd = ['efetch', '-db', 'nuccore', '-id', 'NC_001422.1', '-format', 'docsum',]
 ps = subprocess.run(info_name_cmd, stdout=subprocess.PIPE)
 output = subprocess.check_output((filter_cmd), stdin=ps.stdout)

Thanks.

python sequence efetch • 1.9k views
ADD COMMENT
0
Entering edit mode

If one uses python then you can't get the organism name? It is there in results for sure.

$ efetch -db nuccore -id NC_001422 -format docsum | grep Organism
        <Organism>Escherichia virus phiX174</Organism>
ADD REPLY
0
Entering edit mode

As I stated I know that the Organism name exists when using eftech from command line, but try the code I suggested in python it will give you only the result you get from running:
wget -O - -q "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&rettype=docsum&retmode=xml&id=NC_001422.1" as Pierre Lindenbaum sugggested which does not contain the Organism only title. (you can try it) .
handle = Entrez.efetch(db="nuccore", id="NC_001422.1", rettype="docsum")

ADD REPLY
1
Entering edit mode
5.4 years ago

your query only returns the TaxId

$ wget -O - -q "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&rettype=docsum&retmode=xml&id=NC_001422.1" 

https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20041029/esummary-v1.dtd">
<eSummaryResult>

<DocSum>
    <Id>9626372</Id>
    <Item Name="Caption" Type="String">NC_001422</Item>
    <Item Name="Title" Type="String">Coliphage phi-X174, complete genome</Item>
    <Item Name="Extra" Type="String">gi|9626372|ref|NC_001422.1|[9626372]</Item>
    <Item Name="Gi" Type="Integer">9626372</Item>
    <Item Name="CreateDate" Type="String">1993/04/28</Item>
    <Item Name="UpdateDate" Type="String">2018/07/06</Item>
    <Item Name="Flags" Type="Integer">768</Item>
    <Item Name="TaxId" Type="Integer">10847</Item>
    <Item Name="Length" Type="Integer">5386</Item>
    <Item Name="Status" Type="String">live</Item>
    <Item Name="ReplacedBy" Type="String"></Item>
    <Item Name="Comment" Type="String"></Item>
    <Item Name="AccessionVersion" Type="String">NC_001422.1</Item>
</DocSum>
</eSummaryResult>

using retmode=fasta would return the organism name:

$ wget -O - -q "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&rettype=fasta&retmode=xml&id=NC_001422.1"  | grep -v TSeq_sequence

https://www.ncbi.nlm.nih.gov/dtd/NCBI_TSeq.dtd">
<TSeqSet>
<TSeq>
  <TSeq_seqtype value="nucleotide"/>
  <TSeq_accver>NC_001422.1</TSeq_accver>
  <TSeq_taxid>10847</TSeq_taxid>
  <TSeq_orgname>Escherichia virus phiX174</TSeq_orgname>
  <TSeq_defline>Coliphage phi-X174, complete genome</TSeq_defline>
  <TSeq_length>5386</TSeq_length>
</TSeq>

</TSeqSet>
ADD COMMENT

Login before adding your answer.

Traffic: 1974 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6