Biostar Beta. Not for public use.
Question: Use Perl To Edit Each Line Of A File
0
Entering edit mode

Hi, all

I have a file a.txt contains lines like below:

mez:Mtc_0001 glycosyltransferase

mez:Mtc_0002 feoA; Iron dependent transcriptional regulator; K03709 DtxR family transcriptional regulator, Mn-dep

mez:Mtc_0003 feoB; ferrous iron transporter FeoB; K04759 ferrous iron transport protein B

(there is multi-space between mtc_000x and things following)

I want to use Perl to do following things, but I just begin to learn Perl,

1) delete all mez in all lines at the begining;

2) foreach $line (@line) {separate "\t" but not multi-space or ";"}

3) print and store the results in a new b.txt file and keep the a.txt file unchanged.

could you give some suggestions on this.

thanks!

ADD COMMENTlink 6.1 years ago liupfskygre • 190 • updated 6.1 years ago Kenosis ♦ 1.2k
Entering edit mode
0

why perl ? one sed would ok.

ADD REPLYlink 6.1 years ago
Pierre Lindenbaum
120k
Entering edit mode
1

thanks, I am also trying to learn perl but could not figure things out now. maybe it would be figured out after I go through regular expression chapter.

ADD REPLYlink 6.1 years ago
liupfskygre
• 190
3
Entering edit mode
perl -ne '$_=~s/^mez://;$_=~s/;\s+/\t/g;print "$_\n";' a.txt > b.txt

use perl:

perl -ne 'your perl code'

delete all "mez:" at the beginning:

$_=~s/^mez://;

change all ";" followed by spaces to "\t":

$_=~s/;\s+/\t/g;

write result to new file without changing the input file:

> b.txt
ADD COMMENTlink 6.1 years ago David Langenberger 8.9k
Entering edit mode
1

Well done, and nice explanation of the parts (+1)! You can, however, do the following:

perl -p -ne 's/^mez://;s/;\s+/\t/g' a.txt > b.txt

As you likely know, your s/// implicitly operates on $_, so it's not necessary to explicitly use $_; -p prints the line.

Nit: _"... followed by spaces..."_ -> s/(?=spaces)/white/

ADD REPLYlink 6.1 years ago
Kenosis
♦ 1.2k
Entering edit mode
0

Thanks! It worked well!

ADD REPLYlink 6.1 years ago
liupfskygre
• 190
Entering edit mode
0

You're most welcome!

ADD REPLYlink 6.1 years ago
Kenosis
♦ 1.2k
Entering edit mode
0

thanks

there is multi-space between mtc_000x and things following, how to change those spaces into tab "\t" too, like $_=~s/#multi-spaces#/\t/, right? I review the book, and now I know the use of s///, but what do "=~" symbol and "^"mean?

ADD REPLYlink 6.1 years ago
liupfskygre
• 190
Entering edit mode
0

David Langenberger's s/;\s+/\t/g substitutes a tab for a semi-colon followed by one or more whitespaces. However, you also want the same tab substitution for a the whitespaces after the mtc_000x pattern. You can use the above--with just a couple of changes--right after the first substitution:

s/\s+/\t/

This will replace the first set of whitespaces with a tab. Given the above, final the oneliner could be:

perl -p -ne 's/^mez://;s/\s+/\t/;s/;\s+/\t/g' a.txt > b.txt

The =~ symbol is Perl's regex binding operator. The ^ notation above is an _anchor_ which means "from the beginning of the line."

ADD REPLYlink 6.1 years ago
Kenosis
♦ 1.2k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0