How to split fasta by '>' into a file each containing one sequence, and have the name of that file be the ID?
3
5
Entering edit mode
6.3 years ago
SaltedPork ▴ 170

So far I have this

awk '/^>/{s=++d".fasta"} {print > s}' file.fasta

This splits the file just as I want it, but it produces new files called 1.fasta, 2.fasta, 3.fasta and so on. Is there a method of splitting it that has the new file name as the ID of the sequence inside?

Or failing that, is there a quick way of renaming fasta's based on their ID?

fasta bash split • 7.5k views
ADD COMMENT
4
Entering edit mode
6.3 years ago
GenoMax 141k

faSplit ( http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/faSplit ) utility by Jim Kent from UCSC.

faSplit byname your_file.fa outRoot/
ADD COMMENT
0
Entering edit mode

Thanks, this is good. I want to use this in combination with a find command, could you tell me why this isn't working?

for files in `find . -type f -name '*.consensus.fasta' -not -path "*/temp/*"`
do
    faSplit byname $files outRoot
done
ADD REPLY
1
Entering edit mode

What is not working? Did you make a real directory to replace outRoot?

ADD REPLY
0
Entering edit mode

Hi, yes I did make a more suitable directory! Just didn't include it because the name is sensitive. I meant just looking at the loop, It's so simple but It just doesn't work.

ADD REPLY
1
Entering edit mode

You need to include the trailing / after the directory name for this to work right. Try this.

for files in `find . -type f -name '*.consensus.fasta' -not -path "*/temp/*"`
do
    faSplit byname $files outRoot/
done
ADD REPLY
1
Entering edit mode
6.3 years ago
h.mon 35k

The perl script found here does what you want:

When creating this multi-entry FASTA file, one should take care to make the first word after the > symbol a unique value, as it will be used as the file name for that sequence.

ADD COMMENT
1
Entering edit mode
6.3 years ago

create the filename with sprintf

   echo -e ">hello\nAAA\n>world\nATGCA" |\
    awk '/^>/ {fout=sprintf("%s.fasta",substr($0,2));}{print >> fout;}'
ADD COMMENT

Login before adding your answer.

Traffic: 1985 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6