Biostar Beta. Not for public use.
Python: count how many lines have a specific word
0
Entering edit mode
19 months ago
Illinu • 90
Belgium

I don't know what's wrong with this code. I want to count from a blast report, how many hits correspond to a specific species (the keyword). So my idea is to loop through each line and when the keyword is found, add 1 to count and go to the next line, this because sometimes the name of the species is present more than once in the subject name.

#!/usr/bin/python
import sys
      ##usage: python filterbyword.py file keyword
file = open(sys.argv[1],'r')
keyword = sys.argv[2]
count = 0
for line in file:
    while True:
        if keyword not in line:
            continue
        else:
            break
    count = count + 1
print count
ADD COMMENTlink
2
Entering edit mode

what's wrong with grep ?

ADD REPLYlink
0
Entering edit mode

I was gonna ask that (along with why not awk for more advanced grepping), but maybe OP wishes to add this functionality as a module to existing code?

ADD REPLYlink
0
Entering edit mode

OMG you are right! How didn't I think about grep. Shame on me!!

ADD REPLYlink
2
Entering edit mode
19 months ago
Seattle, WA USA

The problem is, specifically, that your while loop is inside the for loop.

So if the keyword is not in the line, then your script keeps iterating through the while loop, and it appears to "hang".

All you have to do is take out the while loop and test directly:

for line in file:
    if keyword in line:
        count = count + 1
ADD COMMENTlink
0
Entering edit mode

Ok, thanks, I did this but the problem is I don't know if it is adding up each time the word is in the line or if it would add up only once and jump to the next line.

ADD REPLYlink
0
Entering edit mode

The test should only be done once per line, but you can verify this by trying it out with test input that contains multiple instances of a keyword on a line, and seeing if the final count matches what you expect.

ADD REPLYlink
0
Entering edit mode

Exactly as Alex says. Your whole mini-script can even just be compacted into:

with open(sys.argv[1]) as f:
    count = sum(sys.argv[2] in line for line in f)
ADD REPLYlink
0
Entering edit mode
16 months ago
geek_y 9.7k
Barcelona/CRG/London/Imperial

What is happening with this script ? If it not looping again for next search, try to useseek()method. Something like file.seek(0)

ADD COMMENTlink
0
Entering edit mode

(Haven't tested it) but for line in file should iterate through the file.

ADD REPLYlink
0
Entering edit mode

Why would you seek to the start of a file on each iteration?

ADD REPLYlink
0
Entering edit mode
17 months ago
WCIP | Glasgow | UK

You are incrementing count regardless of whether keyword is in line or not. Maybe you want:

#!/usr/bin/python
import sys
      ##usage: python filterbyword.py file keyword
file = open(sys.argv[1],'r')
keyword = sys.argv[2]
count = 0
for line in file:
    while True:
        if keyword not in line:
            continue
        else:
            count += 1
            break
print count

Or even simpler:

for line in file:
    if keyword in line:
        count+=1
print count
ADD COMMENTlink
0
Entering edit mode

The simpler one is what I was using but then I got confused about whether there would be an addition for each time the word appears in the line. I don't want that, I want to have one count for each line with the word, not a count for each time the word is in the file.

ADD REPLYlink
0
Entering edit mode

Just try it. if keyword in line simply tests that statement, it doesn't count how many times the keyword is found in line.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1