How to remove the empty line in using python
2
0
Entering edit mode
6.3 years ago
horsedog ▴ 60

Hi, I'm trying to rename all the sequences, my purpose is to add the taxonomy to each accession number in query.
The original ones look like this:

>YP_003612801.1   
MTDYLLLFVGTVLVNNFVLVKFLGLCPFMGVSKKLETAMGMGLATTFVMTMASICAWLIDTWILIPLGLV
YLRTLAFILVIAVVVQFTEMVVRKTSPALYRLLGIFLPLITTNCAVLGVALLNINLGHNFMQSALYGFSA
AVGFSLVMVLFASIRERLAAADIPAPFRGNAIALVTAGLMSLAFMGFSGLVKL

After I run my script it looks like this

>YP_003612801.1  
_Firmicutes_Clostridia_Clostridiales

MTDYLLLFVGTVLVNNFVLVKFLGLCPFMGVSKKLETAMGMGLATTFVMTMASICAWLIDTWILIPLGLV

YLRTLAFILVIAVVVQFTEMVVRKTSPALYRLLGIFLPLITTNCAVLGVALLNINLGHNFMQSALYGFSA

AVGFSLVMVLFASIRERLAAADIPAPFRGNAIALVTAGLMSLAFMGFSGLVKL

I don't know why there are the empty lines among different lines and I want the taxonomy be appended to the same line to the accession number instead of the new line , so this is what i want:

>YP_003612801.1_Firmicutes_Clostridia_Clostridiales     
MTDYLLLFVGTVLVNNFVLVKFLGLCPFMGVSKKLETAMGMGLATTFVMTMASICAWLIDTWILIPLGLV   
YLRTLAFILVIAVVVQFTEMVVRKTSPALYRLLGIFLPLITTNCAVLGVALLNINLGHNFMQSALYGFSA 
AVGFSLVMVLFASIRERLAAADIPAPFRGNAIALVTAGLMSLAFMGFSGLVKL

If I want to run in python does anyone know it?

python • 26k views
ADD COMMENT
0
Entering edit mode

Your script is doing something wonky, and without looking at your script, we can't help you. Also, please use the formatting bar (especially the code option) to present your post better. I've done it for you this time. Formatting bar

ADD REPLY
0
Entering edit mode

Thanks , I just formatted it!

ADD REPLY
3
Entering edit mode
6.3 years ago
jomo018 ▴ 720

The lines you read include the end-of-line (eol) from the input file. The print command adds its own end-of-line. So you end up with two eol hence one blank line. You can fix this using strip() on the line you read. For example line.strip() will discard eol from line.

ADD COMMENT
1
Entering edit mode

rstrip() should be better than strip() to avoid unwanted trimming in the head of line

ADD REPLY
2
Entering edit mode
6.3 years ago
chen ★ 2.5k

This is my guess:
1, your use readline() to get lines from the original file
2, when you use write() to write lines to the new file, you append a \n into the tail of each line

I can take a look at your code if you post it

ADD COMMENT
0
Entering edit mode
with open("sequence.fasta") as file:
    with open("taxonomy") as name:
        for line in taxonomy.readlines():
            for i in file.readlines():
                if i.startswith(">"):
                    print(i+"_"+line)
                else:
                    print(i)
ADD REPLY
1
Entering edit mode

Use print with end=""

print(i+"_"+line, end="")

In addition, you can combine your with statements:

with open("sequence.fasta") as file, open("taxonomy") as name:

and your for loops:

for line, i in zip(name, file):

Your code has taxonomy.readlines(), but I assume that should be name.readlines(). There is also no reason to call .readlines() since you are simply iterating over the file. You don't need to load it entirely in memory.

ADD REPLY
0
Entering edit mode

then what else I should use instead of readlines()?how to read lines one by one?

ADD REPLY
0
Entering edit mode

Just, like I wrote, iterate over the opened file.

for line in file:
ADD REPLY
0
Entering edit mode

OK, thank you, but still I got the query like this

>YP_003612801.1  
_Firmicutes_Clostridia_Clostridiales

and I remove readlines() already, do you think what could cause this?

ADD REPLY
0
Entering edit mode

You probably need to strip the newline character off:

print(i.strip('\n')+"_"+line, end="")
ADD REPLY
0
Entering edit mode

print() will automatically add a line break in the tail

ADD REPLY

Login before adding your answer.

Traffic: 2360 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6