cut a fasta file into two columns of pos and single base
1
0
Entering edit mode
5.0 years ago
mittjohns ▴ 30

What’s the most efficient (fast and simple code) way to convert/cut a fasta file (either part or the whole sequence) of a chromosome into two column tab-delimited format of pos and base? For example:

fasta file

>chr2
ATGCATTC...

converted pos-base file

1 A
2 T
3 G
4 C

I know we can write a script to do so. But this seems to be a task for a one-liner or some existing tools. Thanks!

fasta sequence base position • 1.1k views
ADD COMMENT
2
Entering edit mode
5.0 years ago
grep -v '^>' in.fasta | tr -d '\n' | grep -o .  | cat -n
ADD COMMENT
0
Entering edit mode

A multi-fasta file will be numbered consecutively, correct? OP should keep that in mind.

ADD REPLY
0
Entering edit mode

thanks Pierre, a brilliant use of grep -o and cat -n.

ADD REPLY

Login before adding your answer.

Traffic: 1940 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6