Biostar Beta. Not for public use.
Changing names of Fasta headers
0
Entering edit mode
14 months ago
tpaisie • 70
University of Florida

So I have a director full of fasta files and I want to change the fasta header in each one by the name of their corresponding fasta file. For example:

HC1993.fa

> X58834
CCTGCATCTGCAA

HC1993.fa

> HC1993
CCTGCATCTGCAA

I have about 50 fasta files like that in a directory that I was to do the same thing to. I've been using this sed command for one file that works:

sed 's/>.*/>HC1193/' HC1993.fa > new/HC1993.fa

But now I want to loop this command through the directory and this is the command I have been using:

for i in $(ls *.fa | rev | cut -c 4- | rev | uniq)
do
    sed 's/>.*/>${i}/' ${i}.fa > new/${i}.fa
done

This command gives me this for all the new fasta file headers

HC1993.fa

>${i}
CCTGCATCTGCAA

Now I know there is a bunch of way to fix this, but could someone help me fix the bash loop I made? I want to learn my incorrect command and now to fix it. Thanks!

ADD COMMENTlink
0
Entering edit mode
5 months ago
Joe 12k
United Kingdom

As I understand it, you just want to make the header of the file, the filename?

e.g. given:

~/test/seqs$ ls
seq1.fasta  seq2.fasta  seq3.fasta
~/test/seqs$ cat seq*
>tpg|Magnaporthiopsis_incrustans|JF414846
 ACTGTAGTAGCTACGATCGATCAGATGATCACGTAGCATCGATCGATCATCGACTAGTAGATCACTCGACATAGATCCACATCAATAGATCATCATCATCATAATCGATCACTAGCAGC
>tpg|Pyricularia_pennisetigena|AB818016
GCAAGNTTCATGACGATGTAGAATGGCTTATCGAAGGGAGCAGGCCAGGGATTGAGGTCCGTCTCACGGGTTGGCTTCACTCCCCCACTGCCAGCCCTCTTGCTGCAACTCCACCAGAA
>tpg|Inocybe_sororia|EU525947
AACCANGCCGCGACGGCGGTGCGATCGGGAAACGCGGCGGTGGCGGAGGAATCGGCCATCCTTCACCATATCGGCCAAGGATTGTGGTTCCTGTAGGGCTCGCGCAGCCCAGGACGCGC

So, for file in *.fasta ; do sed -i "s/^>.*/>"${file%.*}"/gi" $file; done

Yeilds:

~/test/seqs$ for file in *.fasta ; do sed -i "s/^>.*/>"${file%.*}"/gi" "$file"; done
>seq1
ACTGTAGTAGCTACGATCGATCAGATGATCACGTAGCATCGATCGATCATCGACTAGTAGATCACTCGACATAGATCCACATCAATAGATCATCATCATCATAATCGATCACTAGCAGC
>seq2
GCAAGNTTCATGACGATGTAGAATGGCTTATCGAAGGGAGCAGGCCAGGGATTGAGGTCCGTCTCACGGGTTGGCTTCACTCCCCCACTGCCAGCCCTCTTGCTGCAACTCCACCAGAA
>seq3
AACCANGCCGCGACGGCGGTGCGATCGGGAAACGCGGCGGTGGCGGAGGAATCGGCCATCCTTCACCATATCGGCCAAGGATTGTGGTTCCTGTAGGGCTCGCGCAGCCCAGGACGCGC
ADD COMMENTlink
0
Entering edit mode

So yes your interpretation of what I would like is correct. Although I used your command and I'm getting this as an error:

sed: 1: "HC1993.fa": extra characters at the end of H command

And it is not making the new fasta files with the new headers.

ADD REPLYlink
0
Entering edit mode

Are you using Mac OS?

ADD REPLYlink
0
Entering edit mode
14 months ago
India

Example fasta:

$ cat HC1993.fa 
>X58834 
CCTGCATCTGCAA

Expected output (assumption is that first line in each fasta file is fasta header):

$ cat HC1993.fa 
>HC1993
CCTGCATCTGCAA

in bash:

$ for i in *.fa; do sed "1s/.*/>${i%.fa}/" $i; done
>HC1993
CCTGCATCTGCAA

using GNU-parallel:

$  parallel  'sed "1s/.*/>{.}/" {}' ::: *.fa
>HC1993
CCTGCATCTGCAA
ADD COMMENTlink
0
Entering edit mode

Ohh thank you so much that worked!!!!

ADD REPLYlink
0
Entering edit mode

For future reference, code can be further shorted by:

$ parallel  'sed "/^>/ c {.}" {}' ::: *.fa
ADD REPLYlink
0
Entering edit mode
14 months ago
India

try changing 'sed 's/>.*/>${i}/' to sed "s/>.*/>${i}/".

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3