How To Get Sequence Around Blast Result?
1
5
Entering edit mode
11.2 years ago
dustar1986 ▴ 380

Hi,

I blasted a serires of inquiry sequences against the a pre-built blast-database file using blastall and the xml result was parsed by biopython.

I got the genomic coordinates information of the inquiry sequences.

Now I also want to know the 500bp sequences upstream and downstream of each inquiry sequences.

I know biopython can achieve this by extracting sequences from fasta file of each chromosome or from online database.

But I don't want to do this because I don't have enough spaces to keep fasta file of each chromosome locally. And online fetching is too slow when the inquiry file is large.

Is that possible to fetch such sequences using genomic coordinates from the pre-built blast-database file (So I can just keep this database file on the disk for each species)?

biopython • 8.0k views
ADD COMMENT
21
Entering edit mode
11.2 years ago

Of course, it's very easy using BLAST+, so install it right away!

First, make sure you use the -parse_seqids parameter while creating blast database:

makeblastdb -in tair10.fa -dbtype prot -parse_seqids

Then, use blastdbcmd to fetch sequences with a specific range.

blastdbcmd -db tair10.fa -dbtype prot -entry AT1G50920.1 -range 1-10

The output is.

>AT1G50920.1:1-10 | Symbols:  | Nucleolar GTP-binding protein | chr1:18870555-18872570 FORWARD LENGTH=671
MVQYNFKRIT
ADD COMMENT
1
Entering edit mode

Thanks Zielezinski. This is really helpful.

ADD REPLY
0
Entering edit mode

I'm glad I could help!

ADD REPLY
0
Entering edit mode

What if we have a query with 100 sequences? Manual parsing may not be feasible.

ADD REPLY
1
Entering edit mode

If you have a query with multiple accession numbers, you provide them in a text file (e.g., query.txt):

AT1G50920.1
AT1G50930.1
AT1G50940.1
AT1G50950.1

and you run the command:

blastdbcmd -db tair10.fa -dbtype prot -entry_batch query.txt

You will get FASTA sequences for the accession numbers from the query.txt file. However, when you use -batch_entry argument, you can't use the -range argument.

ADD REPLY
0
Entering edit mode

Thank you so much. What if we want to parse 50 nt flanks with the aligned sequence?

ADD REPLY

Login before adding your answer.

Traffic: 2049 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6