python
2
0
Entering edit mode
3.0 years ago
FadyNabil ▴ 20

I have this DNA sequence:

dna_seq_edited = CUGAACUSCACUGECAUUCA 

and I want to cut all letters before "S" in one line using if and loop conditions

I make this script:

#Before_S = True
#for i in range(len(dna_seq)):
#    if dna_seq[i] == 'S':
#        Before_S = False
#        continue
#    if Before_S:`enter code here`
#        print(dna_seq[i])

But I want to make this whole script in a comprehension method ex :

firstexon = [my script]

How can I do that?

python • 931 views
ADD COMMENT
1
Entering edit mode
3.0 years ago

Hello, can you update the title of the post to make it more explicit please, like : "Removing characters in a string in python", or so.

This question has been answered many times and you have many ways to resolve it.

You do not have to reinvent the wheel, you can replace by nothing all character before S using a regular expression (regex)

See the docs

Considering you only have one S in your string. Something like (not tested) :

import re
s = "CUGAACUSCACUGECAUUCA"
replaced = re.sub('.*S, '', s)
print replaced 

Another solution, again if you only have one S, you can split your string using the S character. The following will split your string in 2, you want the second part of the string.

s = "CUGAACUSCACUGECAUUCA"
replaced = s.split('S')[1]
print(replaced)

But if you want to stick to your way for learning purpose, you can use a for loop over your array character not recording the character till you find a S, from then append the downstream characters into another array, your edited array. Good luck.

ADD COMMENT
0
Entering edit mode
3.0 years ago
Dunois ★ 2.5k

Your phrasing is unclear. If a list comprehension solution is what you're after, are you looking for something like this?

#Python 3.9.1

#Generating some random strings containing 'A', 'T', 'G', 'C' separated by a single 'S'. 
import random
def strgen():
    return(''.join(random.choices(['A', 'U', 'C', 'G'], k = random.choice([4,5,6,7,8]))))
seqs = [strgen()+'S'+strgen() for i in range(10)]

print(seqs)
# ['UCGCSCGACGAUA', 'CCUGGGSAGAA', 'GUGCSCCUUA', 'GUGUACASUUCUGUG', 'UCGUSGCCUG', 'GACASAACGC', 'UAGUSGUCCC', 'GCGUSGCGGCACA', 'UAGCUAUSCUGUAUA', 'AGUUSCCCGGCGU']

#Splitting using list comprehension.
seqs_split = [x.split('S') for x in seqs]

print(seqs_split)
# [['UCGC', 'CGACGAUA'], ['CCUGGG', 'AGAA'], ['GUGC', 'CCUUA'], ['GUGUACA', 'UUCUGUG'], ['UCGU', 'GCCUG'], ['GACA', 'AACGC'], ['UAGU', 'GUCCC'], ['GCGU', 'GCGGCACA'], ['UAGCUAU', 'CUGUAUA'], ['AGUU', 'CCCGGCGU']]

Modifying [x.split('S') for x in seqs] to [x.split('S')[0] for x in seqs] or [x.split('S')[1] for x in seqs] will yield just the former or latter portions of the strings w.r.t. S respectively.

ADD COMMENT

Login before adding your answer.

Traffic: 2893 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6