Key and value parse with regex
1
0
Entering edit mode
3.0 years ago
enes ▴ 40

Hi guys, I want to take only specific value from dataframe in python

For example:

MONDO:MONDO:0014405,MedGen:C4014722,OMIM:615934,Orphanet:ORPHA425120|MedGen:CN169374

I want to only OMIM:615934 part from this row.

This is my first attempt:

for i in my_data.ClinVar_CLNDISDB:
    if "OMIM:" in i:
        select = re.compile(r'^(.+?),')
        print(select.findall(str(i)))

But the output gives everything until "first" comma.

like that : MONDO:MONDO:0014405

How can I change my code to reach my aim? Thank you!

python regex • 744 views
ADD COMMENT
1
Entering edit mode

Why do you think the regex '^(.+?),' only gives you that result?

What do you suppose you may need to change about it? More generally, think about what the distinguishing characteristics of the string and its immediate environment are that would allow you to extract it.

Are you certain you even need to use a regex here? Your string is comma-delimited, in particular around the field you want, so there will be simpler ways to achieve what you want.

ADD REPLY
2
Entering edit mode
3.0 years ago
JC 13k

First, try to check how RegEx works, your ^(.+), that means: search for anything at the beginning of the line ^, then capture any or more characters .+ until you find a ,.

The RegEx you are looking is (OMIM:\d+) that means: search for the word OMIM, then : and some numbers \d+;

ADD COMMENT

Login before adding your answer.

Traffic: 1472 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6