how to keep fasta based on pattern in header.
3
0
Entering edit mode
7.2 years ago
yreynaud • 0

Hi, I have a fasta file like this:

>NZ_CP012501.1 Escherichia coli strain 08-00022 plasmid pCFSAN004179G, complete sequence
GTTCTGGACGTACTGTGTCAGTGTGTCGATACCCCGGCGCATATCGACGGGTTTTACGACCAGGAATACG
TTATCAGGCGTCAGCATGGCGAAGAGCCCGGAAAACATCGGTTAACTGAGAAGGCTGGCAGCACATCCGG
ATACCTCCGGGAAGGAAAAGTGTGACAGGCTCATCCGACAATGGTCTGCCATCAGCCATACCGGGAGCGC
CAGACACTGAAACTGGAATAATTTCAGGTGCTCTGGCTCGTTTTTCGGCTTTTGCGACATCCTGCGGCCA

> mus musculus
TTTAAAAAGATATTATATATTA

> or whatever in the header
GGGGATATATTATATATATATAT

I want to keep in a multifasta only sequence belonging to coli. I tried several stuff using SeqIO or awk but it failed each time. Any idea? Thnaks!!!!

sequence • 2.7k views
ADD COMMENT
2
Entering edit mode
7.2 years ago
 awk '/^>/ {ok=index($0,"Escherichia coli");} {if(ok) print;}' in.fasta
ADD COMMENT
0
Entering edit mode

Hehe, we submitted nearly identical solutions at the same moment.

ADD REPLY
0
Entering edit mode

Great minds think alike :)

ADD REPLY
0
Entering edit mode

Thanks guys, it work perfectly!!!

ADD REPLY
2
Entering edit mode
7.2 years ago

You can do it with an awk one-liner if you like:

awk '/^>/{x = /Escherichia coli/;}(x)'
ADD COMMENT
1
Entering edit mode

Mine is smaller ;-) (Things you won't hear T say)

ADD REPLY
0
Entering edit mode
                             :-)
ADD REPLY
2
Entering edit mode
7.2 years ago

Use the record separator variable RS in awk. For example:

$ awk 'BEGIN{ RS = ">"; } { if ($0 ~ /coli/) { printf ">"$0; } }' input.fa > coli.fa

Or:

$ awk 'BEGIN{ RS = ">"; } { if ($0 ~ /mus/) { printf ">"$0; } }' input.fa > mus.fa

Etc.

If you want to automate this with a shell variable:

$ export NEEDLE="mus"
$ awk -vneedle=${NEEDLE} 'BEGIN{ RS = ">"; } { if ($0 ~ needle) { printf ">"$0; } }' input.fa > needle.fa
ADD COMMENT
0
Entering edit mode

Thanks for your help !!!!!!!

ADD REPLY
0
Entering edit mode

You're very welcome!

ADD REPLY

Login before adding your answer.

Traffic: 1604 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6