Biostar Beta. Not for public use.
How do I add ">" symbol to FASTA headers?
0
Entering edit mode
15 months ago

Dear community,

How do I add the ">" symbol to FASTA headers? I searched for other similar posts but none of them worked for me. Can I add the symbol with sed or awk? What would be the command? I want to add ">" to all the headers. Thanks in advance!

Input file (example):

Proteus_mirabilis_ARLG2970_2781 
atggagacaggtacagtaaagtggttcaataatgctaagggctttggttttattaccccagcaaacggtg
gcgaagatatttttgcccactattcaacaattagaatggaaggctaccgcacacttaaagcggggcagaa
agttaattatagcacgataaaagggcctaaaggtgaccatactgaccttatcattcctatcattgaatag
Proteus_mirabilis_ARLG2970_0131 
atgtctgacaaaatgaaaggtcaagttaagtggttcaacgagtctaaaggctttggttttattactccag
cagacggaagcaaagacgtattcgttcacttttctgccattcaaggtaacggtttcaaaactctggctga
aggtcagaacgtagaattcacaattgaaaacggtgcaaaaggtccagcagcagctaacgtaacagctctg
taa 
Proteus_penneri_ATCC35198_1543  
ttacagagcagttacgttagcagctgctggaccttttgcaccgttttcaattgtgaattctacgttctga
ccttcagccagagttttgaaaccgttaccttgaatggcagaaaagtgaacgaatacgtctttgcttccgt
ctgctggagtaataaaaccaaagcctttagactcgttgaaccacttaacttgacctttcattttgtcaga
cat 
Proteus_vulgaris_FDAARGOS366_2819   
ttagagagccaccacgttgcctgctgctgggcctttcataccattttccatggtgaatgaaacttgttgc
ccttcagctaatgttttgaagctatcactttggattgcagagaaatgtacgaatacatctttgctgccat
cagctggagtaataaaaccaaaacctttaccttcatcgaaccattttactgtaccagtcattgtattaga
cat 
Proteus_mirabilis_ARLG2970_2695 
ttacagagcgattacgttcgctgctgcagggcctttagcgccattttcaatagaaaatgaaacttcttgg
ccttctttcagtgacttgaagctttcactttggatcgctgaaaagtgtacgaatacgtctttgctaccgt
ctttaggagtgataaaaccgaagcctttatcatcgttaaaccattttactgtaccagtcattgtattaga
cat

Desired output: before each Proteus_.....................................etc, I want to add the ">" symbol.

FASTA sed awk • 818 views
ADD COMMENTlink
0
Entering edit mode

Can you confirm if the sequences (Proteus word) is on a new line each time? It did not look like that before a mod possibly edited the post.

If they are on a separate lines then a simple sed 's/Proteus/\>Proteus/g' your_file > new_file will work.

ADD REPLYlink
0
Entering edit mode

Thank you for your preoccupation kind sir! The headers are indeed on a new line as should be for a FASTA file. It's just that I'm new to Biostars and don't really know how to edit the text I post.

ADD REPLYlink
4
Entering edit mode
16 months ago
Seattle, WA USA
$ awk '{ if ($0 ~ /_/) { printf ">"; } print $0; }' in.fa > out.fa
ADD COMMENTlink
1
Entering edit mode

Thank you sir! This worked perfectly.

ADD REPLYlink
1
Entering edit mode
14 months ago
Ahill ♦ 1.5k
United States
sed 's/^\([^acgt]\)/>\1/' <your input file> > <your_output_file>
ADD COMMENTlink
1
Entering edit mode
4 months ago
genomax 68k
United States
sed 's/Proteus/\
>Proteus/g' your_file > new_file

Yes, the command has to be typed on two lines as shown to get the newline before >.

Edit: See my note above. I will leave this here in case your sequences don't have the header starting on a fresh line.

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3