Entrez API returns results different from main search?
2
2
Entering edit mode
7.2 years ago

I'm having a relatively baffling problem - I'm trying to access the info for a bunch of BioSample IDs via the Entrez API in Python using Biopython v1.68, here's a minimal example. I'm looking for BioSample SAMEA2467098, URL: https://www.ncbi.nlm.nih.gov/biosample/?term=SAMEA2467098

This should be Brassica napus, cv. Beluga

In Python:

from Bio import Entrez
Entrez.email = 'SNIP'
sample = 'SAMEA2467098'
handle = Entrez.efetch('BioSample', id=sample, retmode='text')
print(handle.readlines())
['1: Non-tumor DNA sample from blood of a human participant in the dbGaP study "An APOBEC Cytidine Deaminase Mutagenesis in Human Cancers"\n', 'Identifiers: BioSample: SAMN02467098\n', 'Organism: Homo sapiens\n', 'Attributes:\n', '    /submitter handle="NHGRI_APOBEC_CytidineDeaminase"\n', '    /biospecimen repository="NHGRI_APOBEC_CytidineDeaminase"\n', '    /study name="An APOBEC Cytidine Deaminase Mutagenesis in Human Cancers"\n', '    /study design="Cross-Sectional"\n', '    /biospecimen repository sample id="TCGA-BP-5199-11A-01D-1429-08"\n', '    /submitted sample id="TCGA-BP-5199-11A-01D-1429-08"\n', '    /submitted subject id="TCGA-BP-5199-11A-01D-1429-08"\n', '    /study disease="Neoplasms"\n', '    /tissue="blood"\n', '    /analyte type="DNA"\n', '    /is tumor="No"\n', '    /subject is affected="Yes"\n', '    /molecular data type="SNV (.MAF)"\n', 'Accession: SAMN02467098\tID: 2467098\n', '\n', '\n']

Suddenly it's human, and the accession changed from SAME A 2467098 to SAM N0 2467098, which is indeed human: https://www.ncbi.nlm.nih.gov/biosample/?term=SAMN02467098

What am I doing wrong here? It works for most of my other BioSample IDs!

Edit: It's not a Biopython problem, using the API manually gives the same result: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=biosample&id=SAMEA2467098&retmode=text

python SRA entrez • 3.6k views
ADD COMMENT
0
Entering edit mode

May be worth creating a ticket at NCBI support. Looks like some sort of a bug in their code.

ADD REPLY
0
Entering edit mode

I dropped them a message late last night, looks like they'll file it as a bug, will update here

ADD REPLY
0
Entering edit mode

Any update on this issue ?

The bug doesn't seem to be fixed yet.

ADD REPLY
0
Entering edit mode

I haven't heard back anything, but just going over the middle step of using the returned ID works

ADD REPLY
1
Entering edit mode
7.2 years ago

Hm, I think this has to do with the way Entrez handles IDs, if I use esearch first to get the BioSample's ID number it works:

This works, using esearch first:

handle = Entrez.esearch('BioSample', term='SAMEA2467098', id='text')
record = Entrez.read(handle)
record
{u'Count': '1', u'RetMax': '1', u'IdList': ['3769633'], u'TranslationStack': [{u'Count': '1', u'Field': 'All Fields', u'Term': 'SAMEA2467098[All Fields]', u'Explode': 'N'}, 'GROUP'], u'TranslationSet': [], u'RetStart': '0', u'QueryTranslation': 'SAMEA2467098[All Fields]'}

Using the new ID now in efetch:

handle = Entrez.efetch('BioSample', id=3769633, retmode='text')
handle.readlines()
['1: Beluga_PBY013; Beluga\n', 'Identifiers: BioSample: SAMEA2467098; SRA: ERS436855\n', 'Organism: Brassica napus\n', 'Attributes:\n', '    /sample name="ERS436855"\n', 'Accession: SAMEA2467098\tID: 3769633\n', '\n', '\n']

We're back at Beluga! But I'm still not 100% sure why this works the way it does, possibly the ID I'm entering clashes with the human entry or gets shortened for search.

ADD COMMENT
1
Entering edit mode

esearch UNIX utils seems to work

esearch -db biosample -query "SAMEA2467098"|esummary
ADD REPLY
0
Entering edit mode

Interesting, I can confirm this! I reckon because this one asks or a 'query' while I'm supplying an 'ID' above?

ADD REPLY
1
Entering edit mode

Yes, I think you're right because if I supply ID then efetch returns the human results.

efetch -db biosample -id "SAMEA2467098"

1: Non-tumor DNA sample from blood of a human participant in the dbGaP study "An APOBEC Cytidine Deaminase Mutagenesis in Human Cancers" Identifiers: BioSample: SAMN02467098 Organism: Homo sapiens

ADD REPLY
0
Entering edit mode

I'd guess that NCBI Entrez Fetch is being 'helpful' by accepting the the sample name, when it really wants the numerical identifier - and this is going wrong. I'd email the EUtilities help team about this if I were you.

ADD REPLY
1
Entering edit mode
7.2 years ago
DCGenomics ▴ 330

Two things,

NCBI Education Group has been offering webinars on this lately.

Check out some of the recent ones at:

https://www.ncbi.nlm.nih.gov/home/coursesandwebinars.shtml

Also, if you put in an issue here, we'll try to help you with a solution:

https://github.com/NCBI-Hackathons/EDirect_EUtils_API_Cookbook

ADD COMMENT

Login before adding your answer.

Traffic: 1849 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6