Filter blastn results by kingdom
0
0
Entering edit mode
8.3 years ago
user230613 ▴ 360

I've a perl script which executes a blastn analysis. I want to parse the output of blast results attending to the kingdom of the best hit. For example, if I'm analysing arabidopsis data, I want to be able to extract those sequences with a plantae hit. And the same procedure for the other kingdoms (plantae, fungi, animalia, protista, bacteria, archaea). This is the blastn command line I'm using:

blastn -query sequences.fa -db nt -out blast_out -evalue 0.01 -outfmt '6 qseqid qlen sseqid slen length pident evalue sscinames stitle staxids sskingdoms' -max_target_seqs 1 -num_threads 25

What I've been doing until now is get hit species names, go to NCBI, extract complete taxID of those species, and in the case of "animalia", grep for "33208" number.

Any suggestion to do this automatically in perl or any script/api available for doing this task?


EDIT:


Maybe I'm not explaining correctly. Let's say I'm working with fungi X specie. I blast the sequences of that fungi against nt, and for example I get this hit for a given sequence:

seq_4459    408    gi|517322946|emb|HF679031.1|    2984819    411    85.40    8e-115    Fusarium fujikuroi IMI 58289    Fusarium fujikuroi IMI 58289 draft genome, chromosome FFUJ_chr09    1279085    Eukaryota

The hit is a fusarium hit, and as I'm working with fungi, I want to keep this line. How can I extract all possible fungi hits, but not other kindoms?

blastn perl bioperl • 3.6k views
ADD COMMENT
0
Entering edit mode

sort both files on the keys and join http://www.computerhope.com/unix/ujoin.htm

ADD REPLY
0
Entering edit mode

Thank you Pierre. I guess that I've not explained the issue correctly. Check the edit if you want, thanks.

ADD REPLY
0
Entering edit mode

Filter your script output to send each kingdom data to its own file. You can do this from within your script, see this.

ADD REPLY
0
Entering edit mode

Thank you Jean-Karim. I guess that I've not explained the issue correctly. Check the edit if you want, thanks.

ADD REPLY
1
Entering edit mode

If I understand correctly, the problem is that you don't get the relevant taxonomic division in the output so for each taxon, you need to find out where in the taxonomic tree it is. Maybe this older post could help you.

ADD REPLY
0
Entering edit mode

Yes, that post has the solution to the problem. Thank you a lot Jean. Another question, do you know the taxonomy ID for protista kingdom? I'm not able to find it in NCBI. Thanks again.

ADD REPLY
0
Entering edit mode

I believe that the kingdom protista is obsolete so you won't find it in a modern taxonomy tree.

ADD REPLY
0
Entering edit mode

OK. I found the solution in another post, sorry, I could not find it before: Does A Script Exist That Given A Species Name Will Give You Kingdom Which The Input Belongs To?

ADD REPLY
0
Entering edit mode

You will have to grab all the TaxID's from fungi and then filter on them. You should be able to get those from Taxdump file from NCBI (ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/). Take a look at the Taxdump Readme file.

ADD REPLY

Login before adding your answer.

Traffic: 2275 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6