Split multifasta file using awk command
2
0
Entering edit mode
4.9 years ago
fec2 ▴ 50

Hi,

I have a FASTA file and need to split the file into multiple FASTAs, one gene per file. Refer to the post Splitting A Fasta File, I have tried below

awk -F "|" '/^>/ {close(F) ; F = $1".fasta"} {print >> F}' yourfile.fa

However, every output file name contain symbol ">", for example ">my_contig_name.fasta".

May I know how to avoid to have ">" in the output file name? Thanks.

sequence • 5.8k views
ADD COMMENT
0
Entering edit mode

Hi,

Actually I have tried several command from these posts, but only the above command work for me. However, this command has created ">" in the output name.

ADD REPLY
2
Entering edit mode
4.9 years ago
AK ★ 2.2k

Try changing the command to:

awk -F "|" '/^>/ {close(F); ID=$1; gsub("^>", "", ID); F=ID".fasta"} {print >> F}' yourfile.fa

If not limited to awk, you can use: seqkit split --by-id yourfile.fa.

ADD COMMENT
0
Entering edit mode

Thank you very much!

ADD REPLY
1
Entering edit mode
4.9 years ago

Try

awk -F "|" '/^>/ {close(F) ; F = substr($1,2,length($1)-1)".fasta"} {print >> F}' yourfile.fa
ADD COMMENT
0
Entering edit mode

Thank you very much!

ADD REPLY

Login before adding your answer.

Traffic: 2076 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6