Searching wgs (whole genome shotgun) database with biopython
1
2
Entering edit mode
7.3 years ago

I used to be able to use BioPython to blast the wgs database at NCBI using the following handle:

NCBIWWW.qblast(program="blastn", database= "wgs" sequence=fasta_string, entrez_query= organism + '[ORGN]')

This does not appear to work anymore, and returns an xml with no results. For searches in wgs the taxon must be specified, and I have previously used the entrez_query parameter to do this. In the web interface this is not available and there is a "limit by" field which has to be filled in instead. Does anybody know how to specify the "limit by" parameter in BioPython or to specify other parameters that may be required for the wgs blast? It is a bit frustrating when scripts stop working because of changes in the ncbi interface.

Thanks in advance

Daniel

blast genome • 2.6k views
ADD COMMENT
1
Entering edit mode
6.9 years ago
jbalberge ▴ 170

The list of WGS databases can be retrieved with taxids2wgs.pl: ftp://ftp.ncbi.nlm.nih.gov/blast/WGS_TOOLS

$ perl taxid2wgs.pl -url_api_ready 487 > wgs_dbs_tax_487

It contains the WGS accesses

WGS_VDB://MTKJ01
WGS_VDB://MTKM01
WGS_VDB://LXLB01
WGS_VDB://LXLA01
WGS_VDB://MJAV01
WGS_VDB://NBTU01

And in biopython

fasta_string = open("myfasta.fa").read() 

db = open("wgs_dbs_tax_487").read()

result_handle = NCBIWWW.qblast(program="tblastn",
                                   database=db,
                                   (...))

blast_record = NCBIXML.read(result_handle)

This is not convenient but gives you the hits:

APBS01000081
AEEF01000091
NBTU01000001
AEPJ01000146
LXLB01000009
LXLA01000006
MJAV01000001
CMWW01000023
ADD COMMENT
0
Entering edit mode

(credit to https://github.com/khyox/draftGenomes/issues/1)

IIUC, taxid2wgs doesn't report all wgs records. for example:

perl taxid2wgs.pl 90371 | grep AAJDPJ

would return nothing, even though https://www.ncbi.nlm.nih.gov/assembly/GCA_007903735.1 exists.

ADD REPLY

Login before adding your answer.

Traffic: 1578 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6