Biostar Test Site

This is site is used for testing only. Visit: https://www.biostars.org to ask a question.

How do I write the regex expression for the line?
1
0
Entering edit mode
12 days ago
Inayat • 0

How do I write the regex expression to know whether a line starts with the YBR069C and 690, and note that there are some spaces between YBR069C and 620

the line is like this:

YBR069C 620 
AUGGACGAUAGUGUCAGUUUCAUUGCCAAAGAGGCCAGUCCAGCACAAUAUUCGCACAGUUUGCAUGAAAGAACACACAGUG

Thank you

regex • 145 views
ADD COMMENT
0
Entering edit mode

Please post few more examples/lines or example input and expected output. It's not clear what you want to achieve here (to me).

ADD REPLY
0
Entering edit mode

I have a list of genes and their size that I already know. What I wanted to know is the sequence of these genes in a file containing several thousands of genes as shown in the image.  That is I want to map the gene name and size that I already know to the file containing genes to know their sequence

ADD REPLY
0
Entering edit mode
11 days ago

/^YBR069C\s+620/ is a regex expression to find "lines starting with YBR069C and having 620 separated by spaces", but your example line is not 1 line but 2, plus the image posted shows that the "starting with YBR069C" requirement is not well defined. If you don't perfectly describe your input and your requirements you won't be able to find the appropriate results.

As it looks like you are interested in finding sequences with header lines structure of number label number, I would go for a printing flag solution to print every line under a desired header until the next undesidered header like this one:

perl -ne '/^\d/ and $p = 0; /^\d+\s+YBR069C\s+620$/ and $p = 1; print if $p' input.txt
ADD COMMENT
0
Entering edit mode

Thank you it worked out

ADD REPLY

Login before adding your answer.

Traffic: 193 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6