Biostar Beta. Not for public use.
Difficult To Download Gene Sequences From Ncbi
0
Entering edit mode
13 months ago
Norwich, UK

Hello everyone: I'm having a problem trying to download gene sequences from the Gene database at NCBI website using biopyhon. I iniciated the code by setting up a basic test search for two gene sequences in the "gene" database for S. coelicolor (txid100226).

from Bio import Entrez
Entrez.email = "chief@marsstation.com"
handle = Entrez.esearch(db="gene",term="txid100226[Organism]",retmax=2)
record = Entrez.read(handle)

The first ID for the first hit on this search is:

record_list = record["IdList"]
print record_list[0]
1096915

So this first ID was used to download the gene of interest by using this:

seq = Entrez.efetch(db="gene",id=record_list[0],rettype="fasta").read()

However the result stored in "seq" is the following:

<?xml version="1.0"?>
<!DOCTYPE Entrezgene-Set PUBLIC "-//NLM//DTD NCBI-Entrezgene, 21st January 2005//EN" "http://www.ncbi.nlm.nih.gov/data_specs/dtd/NCBI_Entrezgene.dtd">
<Entrezgene-Set><div><div class="rprt"><p class="title"><a href="/gene/1096915" ref="ordinalpos=1&amp;ncbi_uid=1096915&amp;link_uid=1096915">SCO1489 –DNA-binding protein [Streptomyces coelicolor A3(2)] </a></p><div class="supp"><p class="desc">DNA-binding protein</p><dl class="details"/><dl class="details">    <dt class="desig">Other Aliases: </dt><dd class="desig">SCO1489, SC9C5.13, bldD</dd></dl><dl class="details"/><dl class="details"><dt class="desig">Genomic context: </dt><dd class="desig">Chromosome</dd></dl><dl class="details"><dt class="desig"> Annotation: </dt><dd class="desig">NC_003888.3 (1592381..1592884)</dd></dl></div><div class="aux"><div class="resc"><dl class="rprtid"><dt>ID:</dt> <dd>1096915</dd> </dl></div><p class="links nohighlight"><span> </span></p></div></div></div></Entrezgene-Set>

If I put db="protein" instead of gene I get the correct protein sequence.

I realize that one way to download the DNA sequence was manually, directly from the contig NC_003888.3 in S. coelicolor at the position 1592381..1592884 for this particular ID. That info is stored in "seq"

So here is the question: Is there any method (or trick) to download that DNA sequence using biopython? How can I solve this problem?

JFC

biopython entrez • 3.1k views
ADD COMMENTlink
1
Entering edit mode
22 months ago
Neilfws 48k
Sydney, Australia

The short answer is that _rettype = "fasta"_ is not a valid return mode for the Gene database. Please refer to Table 1 in the EFetch section of the NCBI EUtils documentation.

The longer answer - how to solve this problem - I'll edit this answer later, no time to write it just now.

ADD COMMENTlink
0
Entering edit mode

Even if I try to change the rettype, it doesn't work. The gene sequence for this example is within contig sequence, so the GI code for this sequence directs you to the contig. I don't know what to do to solve it, but thank you for your answer.

ADD REPLYlink
0
Entering edit mode

Well no, changing rettype won't work. The only valid rettype for db=Gene is gene_table; valid retmodes are asn.1, xml and text. In short: sequences cannot be retrieved from the Gene database.

ADD REPLYlink
0
Entering edit mode
15 months ago
Philadelphia

Well I am not used to using Entrez gene but I think you are retrieving the Entrez gene page information instead of the sequence information. You should try either "genbank" or "nucleotide" instead of "gene" and see if it helps.

ADD COMMENTlink
0
Entering edit mode

Thanks for your answer, but it didn't work :( If I use "gene bank" it displays an error and if I try with nucleotide database, what I get is the whole contig. Hmm, about using Entrez gene I'm sure that I'm not retrieving the information page, because I get a protein sequence.

ADD REPLYlink
0
Entering edit mode
12 months ago
Leandro Lima • 920
San Francisco, CA

Hello! I think this could help you.

problem when downloading large number of sequences from Genbank

ADD COMMENTlink
0
Entering edit mode

Not really since fasta cannot be retrieved from the Gene database.

ADD REPLYlink
1
Entering edit mode

In this case, db="nuccore"

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1