Biostar Beta. Not for public use.
Extract the targeted txt using python
0
Entering edit mode
2.9 years ago
horsedog • 30

Hi, I'm beginner to python, here I have a very basic question about extracting targeted text. I have thousands of strings like this :

>ref|WP_070076791.1| iron-sulfur protein [Acinetobacter proteolyticus]

Here I only need WP_070076791.1, so I write a script in python:

data = open("data.fasta").read()

import re

for line in data:

 start = line.startswith(">ref|")

 end = line.endswith("| ")

 number = re.search(r'start(.*?)end',line)

print(number)

But it gives me "none", does anybody have idea?

python • 551 views
ADD COMMENTlink
0
Entering edit mode

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLYlink
1
Entering edit mode
20 months ago
st.ph.n ♦ 2.5k
Philadelphia, PA

Do you only need what would be in the position of each header in the fasta, that WP_070076791.1 is in?

with open('data.fasta', 'r') as f:
    for line in f:
        if line.startswith('>'):
            print line.strip().split('|')[1]

if this isn't an assignment, and you can use other options:

grep -e '>' data.fasta | cut -f 2 -d '|'
ADD COMMENTlink
1
Entering edit mode
20 months ago
Seattle, WA USA

If you don't need to use Python, you can use grep with awk:

$ grep '^>' data.fasta | awk -v FS="|" '{ print $2; }' > result.txt

If you have to use Python:

#!/usr/bin/env python

import sys

for line in sys.stdin:
    if line.startswith('>'):
        line = line.strip()
        elems = line.split('|')
        sys.stdout.write("%s\n" % (elems[1]))

You could use it like so:

$ ./filter.py < data.fasta > result.txt
ADD COMMENTlink
0
Entering edit mode
18 months ago
France/Nantes/Institut du Thorax - INSE…
start = line.startswith(">ref|")

startsWith returns a boolean not an index/integer: https://docs.python.org/2/library/stdtypes.html

I think you're looking for find

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1