How to extract gene sequences from the blastp output
1
0
Entering edit mode
9.1 years ago

I am working on a pipeline, as a art of which I did a blastp and I have the output which looks like this.

                                                                     Score       E
Sequences producing significant alignments:                          (Bits)    Value
 gnl|Amel_4.5|GB47895-PA                                               305     8e-95
 gnl|Amel_4.5|GB44317-PA                                               282     7e-86
 gnl|Amel_4.5|GB52230-PA                                               262     1e-78
 gnl|Amel_4.5|GB52041-PA                                               246     8e-74

I would like to retrieve the entire nucleotide sequences of the best hits, I am lost as to how to get this done. Can anyone help me with some inputs for this purpose.

blastp • 2.7k views
ADD COMMENT
0
Entering edit mode

I don't deal with these kind of things but seems I could help you. You have a file with all the sequences in a file and you want to retrieve the nt sequences of the best hits of given ids (gnl|Amel_4.5|GB47895-PA - hope this is the sequence id or header).

Is this what you want to do?

ADD REPLY
0
Entering edit mode

Yes your exactly correct i need to get the nt sequences of best hits given those sequence ids.

ADD REPLY
0
Entering edit mode
9.1 years ago
venu 7.1k

The following might work.

$ awk '{print$1}' blastOut.txt > blastIds.txt

Take a look at perl code to extract sequences from multi-line fasta works on all test files but not on research files

If you want each hit in individual file, download faOneRecord from the above found link and use a loop.

ADD COMMENT

Login before adding your answer.

Traffic: 1446 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6