How Can I Get Download Genbank Files With Just The Accession Number?
2
2
Entering edit mode
11.0 years ago
biohack92 ▴ 170

I've got an array full of accession numbers, and I'm wondering if there's a way to automatically save genbank files using BioPerl. I know you can grab sequence information, but I want the entire GenBank record.

#!/usr/bin/env perl
use strict;
use warnings;
use Bio::DB::GenBank;

my @accession;
open (REFINED, "./refine.txt") || die "Could not open: $!";

while(<REFINED>){
    if(/^(\D+)\|(.*?)\|/){
    push(@accession, $2);
    }
}
close REFINED;
foreach my $number(@accession){

    my $db_obj = Bio::DB::GenBank->new;
    }
genbank bioperl • 7.8k views
ADD COMMENT
0
Entering edit mode
11.0 years ago

I use this to get Genbank files by a text file of accession nember

#!usr/bin/local/perl -w #@author :joey #usage: perl get_multi_seq_fromNCBI_by_acc.pl acc_file.txt #use this program,can get seq by accession number from NCBI,and name it by acc. #$ARGV[0]=acc.txt

use strict; use Bio::DB::GenBank; use Bio::SeqIO; use Bio::Seq::RichSeq;

open(FILE,$ARGV[0])|| die ("can not open file:$!"); my @acc=<file>;

my $db=new Bio::DB::GenBank(); my $allseq=$db->get_Stream_by_acc([@acc]); while(my $seq=$allseq->next_seq){ #my $filename=$seq->accession; my $output = new Bio::SeqIO(-file=>">>output.fasta",-format=>"fasta"); #if you want fasta seq,can use next #my $output = new Bio::SeqIO(-file=>">$filename.gb",-format=>"genbank"); if($seq){ $output->write_seq($seq); } else{ print STDERR "cannot find sequence for accession number:@acc \n"; } $output ->close(); } close(FILE);

ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Thanks Joey. I actually would like the entire GenBank file and not just the sequence. Is there any way to automate that?

ADD REPLY
0
Entering edit mode

I think that script do return the entire genbank file.

ADD REPLY
0
Entering edit mode
11.0 years ago
qiyunzhu ▴ 430

Here's a very simple non-BioPerl solution. It simply connects NCBI by HTTP and downloads the genbank files.

use LWP::Simple;
$s = get "http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=".join (",", @accession);
push (@gi, $1) while ($s =~ s/<Id>(\d+)<\/Id>//);
$s = get "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&rettype=gb&id=".join (",", @gi);
print $s;
ADD COMMENT
1
Entering edit mode

you don't need to run esearch with an ACN. Jyst use efetch with the ACN instead of the gi.

ADD REPLY
1
Entering edit mode

Awesome! I didn't know that before. It worked! Thank you so much!

So the better version should be (I cannot believe how simple it is):

use LWP::Simple;
$s = get "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&rettype=gb&id=".join (",", @accession);
print $s;

But this trick does not work for esummary, i just realized.

Anyway it's good piece of information and I should apply that to my programs.

ADD REPLY

Login before adding your answer.

Traffic: 1474 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6