How To Add Specific Word To Fasta Header
4
2
Entering edit mode
12.7 years ago
Palu ▴ 290

I have more than 5000 fasta sequence in a file and want to add a word , for instance phosphate, to header of all sequence. please tell me a PERL solution for that.

fasta • 17k views
ADD COMMENT
7
Entering edit mode
12.7 years ago
brentp 24k

inplace:

    perl -pi -e "s/^>/>phosphate-/g" your.fasta

or new file:

    perl -p -e "s/^>/>phosphate-/g" your.fasta > phosphate.fasta

to add it to the end, use this regexp

    's/^(>.*)$/$1-phosphate/g'
ADD COMMENT
0
Entering edit mode

THANKS brentp, but i want to in the last of my header..is there any trick

ADD REPLY
0
Entering edit mode

@palu I edited my answer, see the last line.

ADD REPLY
0
Entering edit mode

thank you very much sir. for any layman person like me. final code will be like that

perl -p -e "s/^(>.*)$/$1-phosphate/g" your.fasta > phosphate.fasta

ADD REPLY
0
Entering edit mode

except you should use single quotes: perl -p -e 's/^(>.*)$/$1-phosphate/g' in.fasta > out.fasta

ADD REPLY
0
Entering edit mode

palu, if you like Brent's answer the best, you should select it as such (hover over the votes to do that).

ADD REPLY
0
Entering edit mode

@newlife thanks i do that

ADD REPLY
4
Entering edit mode
12.7 years ago
Daniel ★ 4.0k

An easy way with sed:

sed 's/>.*/&_phosphate/' foo.in >bar.out
ADD COMMENT
2
Entering edit mode
8.9 years ago

A faster option, from the BBMap package:

bbrename.sh in=file.fasta out=renamed.fasta prefix=phosphate addprefix=t

ADD COMMENT
0
Entering edit mode

Is there a way to get this to work with protein fasta?

ADD REPLY
0
Entering edit mode
8.9 years ago
arnstrm ★ 1.8k

I know there are lots of option and it can be easily done with many unix one liners, but here is another alternative (my favorite).

bioawk -c fastx '{ print ">PREFIX"$name; $seq }' input.fasta
bioawk -c fastx '{ print ">"$name"|SUFFIX"; $seq }' input.fasta
ADD COMMENT
0
Entering edit mode

Hi, I'm not sure I understand the (bio)awk syntax, but your command was not working for me (did not print sequences)...I put there a new line instead of a semicolon:

 bioawk -c fastx '{ print ">PREFIX" $name "\n" $seq }' input.txt >outupt.txt

which seems to work. Anyway thanks for pointing me towards the solution.

ADD REPLY

Login before adding your answer.

Traffic: 3149 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6