How to add name of genus and species automatically in headers of a multi-FASTA file after run BLASTn?
0
0
Entering edit mode
5.4 years ago

I need put name of genus and species of the best match in BLASTn with percentage of identity automatically, in headers of a multi-FASTA file. How can I do this? For example:

Before:

>xxxx|yyyy|zzzz
ATCG...
>xxxx|yxyx|zxzx
ATCG...

After:

>xxxx|yyyy|zzzz|*Genus_species*_99%
ATCG...
>xxxx|yxyx|zxzx|*Genus_species_2*_100%
ATCG...

Thanks!

BLASTN RENAME HEADER MULTI-FASTA SCRIPT • 2.1k views
ADD COMMENT
0
Entering edit mode

Did you know that the singular from of the word species is species?

ADD REPLY
0
Entering edit mode

Ok. Thanks. I need put genus and species.

ADD REPLY
0
Entering edit mode

How does your blast output looks like?

ADD REPLY
0
Entering edit mode

'6 qseqid sseqid stitle pident length evalue sstart send qlen slen'

ADD REPLY
0
Entering edit mode

What have you tried? If you give real workable examples, there is likely someone here that will do this for you.

ADD REPLY
0
Entering edit mode

What is the blast command you used ?

Specie names is not that easy to catch with blast. Chose the informations you want in your blast output amongst this list. Like qseqid, pident and scomnames...

The specie name you want could be under sscinames (Subject Scientific Name(s), separated by a ';'), scomnames (Subject Common Name(s), separated by a ';') or sblastnames (Subject Blast Name(s), separated by a ';')

Then, keep the line of the best pident for each qseqid

You can now use a script language as Perl or Python (you can even do it in Unix if you want)

  • Read your output blast file
  • Create a dictionnary with qseqid as key and scomnames+pident as value
  • Read your fasta file
  • For each record of your fasta file
    • Check if the id exist as key in your directory, if yes, change the id name
    • Write the record in a new file
ADD REPLY
0
Entering edit mode

Thanks for answering! But I need help to write this script.

ADD REPLY
0
Entering edit mode

I could, but you have to help me, giving me the blast command line you used, and the attribute you want as species (sscinames, sscinames, sblastnames)

If you don't know which attribute could be the best "species" for you, re-run your blast command adding sscinames, sscinames and sblastnames to your command

'6 qseqid sseqid stitle pident length evalue sstart send qlen slen sscinames sscinames sblastnames'

And copy the 10 first line of the blast output in your post

ADD REPLY
0
Entering edit mode

It all depends on your reference so if you can add that to your question it is easier to help. If you have taxonid's in your database it is "fairly easy" with python. You need to add staxid to the output and use the rankedlineage.dmp file. But like I said, we dont know your reference and where you want to get the species names from.

ADD REPLY
0
Entering edit mode

could you post some specific example for input and expected output? Description is too generic.

ADD REPLY

Login before adding your answer.

Traffic: 2732 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6