Question

How to extract gene sequences from the blastp output

0

Entering edit mode

9.1 years ago

vineelagangalapudi • 0

I am working on a pipeline, as a art of which I did a blastp and I have the output which looks like this.

                                                                     Score       E
Sequences producing significant alignments:                          (Bits)    Value
 gnl|Amel_4.5|GB47895-PA                                               305     8e-95
 gnl|Amel_4.5|GB44317-PA                                               282     7e-86
 gnl|Amel_4.5|GB52230-PA                                               262     1e-78
 gnl|Amel_4.5|GB52041-PA                                               246     8e-74

I would like to retrieve the entire nucleotide sequences of the best hits, I am lost as to how to get this done. Can anyone help me with some inputs for this purpose.

blastp • 2.7k views

ADD COMMENT • link updated 23 months ago by Ram 43k • written 9.1 years ago by vineelagangalapudi • 0

0

Entering edit mode

I don't deal with these kind of things but seems I could help you. You have a file with all the sequences in a file and you want to retrieve the nt sequences of the best hits of given ids (gnl|Amel_4.5|GB47895-PA - hope this is the sequence id or header).

Is this what you want to do?

ADD REPLY • link 9.1 years ago by venu 7.1k

0

Entering edit mode

Yes your exactly correct i need to get the nt sequences of best hits given those sequence ids.

ADD REPLY • link 9.1 years ago by vineelagangalapudi • 0

Ram · Answer 1 · 2015-04-04

0

Entering edit mode

9.1 years ago

venu 7.1k

The following might work.

$ awk '{print$1}' blastOut.txt > blastIds.txt

Take a look at perl code to extract sequences from multi-line fasta works on all test files but not on research files

If you want each hit in individual file, download faOneRecord from the above found link and use a loop.

ADD COMMENT • link updated 23 months ago by Ram 43k • written 9.1 years ago by venu 7.1k