Biostar Beta. Not for public use.
Use Perl To Edit Each Line Of A File
0
Entering edit mode
13 months ago
liupfskygre • 190
United States

Hi, all

I have a file a.txt contains lines like below:

mez:Mtc_0001 glycosyltransferase

mez:Mtc_0002 feoA; Iron dependent transcriptional regulator; K03709 DtxR family transcriptional regulator, Mn-dep

mez:Mtc_0003 feoB; ferrous iron transporter FeoB; K04759 ferrous iron transport protein B

(there is multi-space between mtc_000x and things following)

I want to use Perl to do following things, but I just begin to learn Perl,

1) delete all mez in all lines at the begining;

2) foreach $line (@line) {separate "\t" but not multi-space or ";"}

3) print and store the results in a new b.txt file and keep the a.txt file unchanged.

could you give some suggestions on this.

thanks!

perl • 2.5k views
ADD COMMENTlink
0
Entering edit mode

why perl ? one sed would ok.

ADD REPLYlink
1
Entering edit mode

thanks, I am also trying to learn perl but could not figure things out now. maybe it would be figured out after I go through regular expression chapter.

ADD REPLYlink
3
Entering edit mode
10 months ago
Deutschland
perl -ne '$_=~s/^mez://;$_=~s/;\s+/\t/g;print "$_\n";' a.txt > b.txt

use perl:

perl -ne 'your perl code'

delete all "mez:" at the beginning:

$_=~s/^mez://;

change all ";" followed by spaces to "\t":

$_=~s/;\s+/\t/g;

write result to new file without changing the input file:

> b.txt
ADD COMMENTlink
1
Entering edit mode

Well done, and nice explanation of the parts (+1)! You can, however, do the following:

perl -p -ne 's/^mez://;s/;\s+/\t/g' a.txt > b.txt

As you likely know, your s/// implicitly operates on $_, so it's not necessary to explicitly use $_; -p prints the line.

Nit: _"... followed by spaces..."_ -> s/(?=spaces)/white/

ADD REPLYlink
0
Entering edit mode

Thanks! It worked well!

ADD REPLYlink
0
Entering edit mode

You're most welcome!

ADD REPLYlink
0
Entering edit mode

thanks

there is multi-space between mtc_000x and things following, how to change those spaces into tab "\t" too, like $_=~s/#multi-spaces#/\t/, right? I review the book, and now I know the use of s///, but what do "=~" symbol and "^"mean?

ADD REPLYlink
0
Entering edit mode

David Langenberger's s/;\s+/\t/g substitutes a tab for a semi-colon followed by one or more whitespaces. However, you also want the same tab substitution for a the whitespaces after the mtc_000x pattern. You can use the above--with just a couple of changes--right after the first substitution:

s/\s+/\t/

This will replace the first set of whitespaces with a tab. Given the above, final the oneliner could be:

perl -p -ne 's/^mez://;s/\s+/\t/;s/;\s+/\t/g' a.txt > b.txt

The =~ symbol is Perl's regex binding operator. The ^ notation above is an _anchor_ which means "from the beginning of the line."

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1