Downloading genomes from Bioproject ID
1
1
Entering edit mode
8.0 years ago

Hi guys,

I have a list of ~200 Bioproject IDs from NCBI. I want to download the associated genomes and then BLAST a particular sequence against this set of genomes.

I have used Entrez.Elink in biopython to obtain the NCBI Assembly reference IDs, but I am not sure how I can then download the genomes using this information.

Thanks for your help!

sequencing blast Assembly elink biopython • 4.0k views
ADD COMMENT
0
Entering edit mode
8.0 years ago
natasha.sernova ★ 4.0k

This question has been already asked here,

Fetching Genbank Entries For List Of Accession Numbers.

You will find several biopython scripts inside.

When you have your genomes, make a database out of them with makeblastdb.

makeblastdb -in input_file (nucleotide file-name) -dbtype nucl (if nucleotide) -out dbname (the database

name)

Use input file in *.fa-format.

2) Run the blast-program:

blastn -query input (with the gene file) -db (database name, which was created in step 1) -out outname (file name with the results)

So use blastn for this search in the nucleotide database.

ADD COMMENT

Login before adding your answer.

Traffic: 3615 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6