Biostar Beta. Not for public use.
Split multifasta file using awk command
0
Entering edit mode
16 months ago
fec2 • 20

Hi,

I have a FASTA file and need to split the file into multiple FASTAs, one gene per file. Refer to the post Splitting A Fasta File, I have tried below

awk -F "|" '/^>/ {close(F) ; F = $1".fasta"} {print >> F}' yourfile.fa

However, every output file name contain symbol ">", for example ">my_contig_name.fasta".

May I know how to avoid to have ">" in the output file name? Thanks.

sequence • 175 views
ADD COMMENTlink
0
Entering edit mode

Hi,

Actually I have tried several command from these posts, but only the above command work for me. However, this command has created ">" in the output name.

ADD REPLYlink
2
Entering edit mode
16 months ago
SMK ♦ 1.3k
Ghent, Belgium

Try changing the command to:

awk -F "|" '/^>/ {close(F); ID=$1; gsub("^>", "", ID); F=ID".fasta"} {print >> F}' yourfile.fa

If not limited to awk, you can use: seqkit split --by-id yourfile.fa.

ADD COMMENTlink
0
Entering edit mode

Thank you very much!

ADD REPLYlink
1
Entering edit mode
15 months ago
EMBL Heidelberg, Germany

Try

awk -F "|" '/^>/ {close(F) ; F = substr($1,2,length($1)-1)".fasta"} {print >> F}' yourfile.fa
ADD COMMENTlink
0
Entering edit mode

Thank you very much!

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1