Question

Can You Improve This Erlang Code ? (Was "Average Length Of The Sequences In A Fasta File")

1

Entering edit mode

13.8 years ago

Pierre Lindenbaum 161k

In a previous question "Code golf: mean length of fasta sequences", Eric asked for some solutions to get the average length of the sequences in a fasta file.

I tried to anwser this question using the following Erlang code:

-module(golf).
-export([test/0]).

line([],{Sequences,Total}) ->  {Sequences,Total};
line(">" ++ Rest,{Sequences,Total}) -> {Sequences+1,Total};
line(L,{Sequences,Total}) -> {Sequences,Total+string:len(string:strip(L))}.

scanLines(S,Sequences,Total)->
        case io:get_line(S,'') of
            eof -> {Sequences,Total};
            {error,_} ->{Sequences,Total};
            Line -> {S2,T2}=line(Line,{Sequences,Total}), scanLines(S,S2,T2)
        end  .

test()->
    {Sequences,Total}=scanLines(standard_io,0,0),
    io:format("~p\n",[Total/(1.0*Sequences)]),
    halt().

Compilation/Execution:

erlc golf.erl
erl -noshell -s golf test < sequence.fasta
563.16

this code seems to work fine for a small fasta file but it takes hours to parse uniprot_sprot.fasta (in fact , I pressed Ctr-C). Why ? I'm an Erlang newbie, can you improve this code ?

code fasta sequence functional • 3.0k views

ADD COMMENT • link updated 5.6 years ago by Ram 43k • written 13.8 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Pierre, in the mean time you should may be post your question in stackoverflow as well. Just a suggestion.

ADD REPLY • link 13.8 years ago by Fred Fleche 4.3k

0

Entering edit mode

Fred, I'll do if I don't get an answer here :-)

ADD REPLY • link 13.8 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

posted on SO: http://stackoverflow.com/questions/3296855

ADD REPLY • link 13.8 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

13.8 years ago

Istvan Albert 100k

A few years ago I read a blog post series on Erlang's text processing performance. Back then it seemed that the language did not have an efficient string representation.

ADD COMMENT • link updated 5.6 years ago by Ram 43k • written 13.8 years ago by Istvan Albert 100k

Ram · Accepted Answer · 2010-07-22

1

Entering edit mode

13.8 years ago

Pierre Lindenbaum 161k

The answer is here

ADD COMMENT • link updated 5.6 years ago by Ram 43k • written 13.8 years ago by Pierre Lindenbaum 161k