Biopython script to change file formats and headers
2
0
Entering edit mode
8.3 years ago
skbrimer ▴ 740

Hi group,

I'm trying to make a script in python that will change formats from fastq to fasta, which I have:

from Bio import SeqIO
import sys 

# grabbing the file and the name 
seq_file = sys.argv[1]
labels = seq_file.split(".")

# converting the file from fastq to fasta
SeqIO.convert(seq_file,"fastq",labels[0]+".fasta","fasta")

no problem; but now I would like to change the header of the fasta file in the same script and I'm stuck. When I add the SeqIO.parse function like this

for seq_record in SeqIO.parse(labels[0]+".fasta","fasta"):
    seq_record.id = labels[0] # renaming the pseudogene with the lab id
    SeqIO.write(seq_record,labels[0]+".fasta","fasta")

​I get an error saying I didn't define seq_record, which I thought I did, and the script fails. I thought the way this script would work is it would convert the file, making the new fasta file (which it does when I do not have the parse function in there), then parsing that file.

So now I'm wondering if it is in fact producing that file since its no longer the end of the script, do I need to make a temp file in order to due both actions in one script?

EDIT

Well it works now, so if anyone would like to do a similar thing here was my solution

# this script is used to convert fastq files to fasta files
# then to rename the fasta ID with the sample ID from the lab

from Bio import SeqIO
import sys 

# grabbing the file and the name 
seq_file = sys.argv[1]
labels = seq_file.split(".")

# converting the file from fastq to fasta
SeqIO.convert(seq_file,"fastq",labels[0]+".fasta","fasta")

# taking the converted file and then changing the fasta header
for seq_record in SeqIO.parse(labels[0]+".fasta","fasta"):
    seq_record.id = labels[0] # renaming the pseudogene with the lab id
    SeqIO.write(seq_record, labels[0]+".fasta","fasta")
biopython processing • 4.1k views
ADD COMMENT
2
Entering edit mode
8.3 years ago
skbrimer ▴ 740

Here is the final script :D

# this script is used to convert fastq files to fasta files 
# then to rename the fasta ID with the sample ID from the lab

from Bio import SeqIO
import sys 

# grabbing the file and the name 
seq_file = sys.argv[1]
labels = seq_file.split(".")

# converting the file from fastq to fasta
SeqIO.convert(seq_file,"fastq",labels[0]+".fasta","fasta")

# taking the converted file and then changing the fasta header
handle = open(labels[0]+".fasta","rU")

for seq_record in SeqIO.parse(handle,"fasta"):
    old_header = seq_record.id
    new_header = labels[0]
    seq_record.id = new_header + "_" + old_header # renaming the pseudogene with
                                                  # the lab id and the referance 
                                                  # used
    seq_record.description = "" # this strips the old header out
    SeqIO.write(seq_record, labels[0]+".fasta","fasta")

handle.close()
ADD COMMENT
1
Entering edit mode
8.3 years ago

Hello,

your second code block looks a little bit weird to me, normally you need to define a file handle for the SeqIO parser like shown below:

handle = open(labels[0] + ".fasta", "rU")
for seq_record in SeqIO.parse(handle, "fasta"):
       seq_record.id = labels[0] # renaming the pseudogene with the lab id
       SeqIO.write(seq_record,labels[0]+".fasta","fasta")

handle.close()

I am not completely sure if SeqIO parser can actually work without a file handle, but maybe you try it out and see if my version above already fixes your problem.

EDIT

I didn't see your edit, so if it works then just ignore my post :D

ADD COMMENT
0
Entering edit mode

No worries! Thank you for your help and you are correct if you are using the an older version, the latest version of Biopython, at least for the SeqIO.parse function, doesn't require the handle anymore.

Now I'm just trying to figure out why its renaming the header and keeping the old header as well.

Thank you again.

ADD REPLY
0
Entering edit mode

I also add your suggestion, it makes for better file control. I'm really bad about remembering to use open and close file commands since my stuff is small. I just need to be more vigilant and the handle helps with that.

I also found a post that said to completely remove the old header I need to edit the old description I will paste my final code below.

Thank you again.

ADD REPLY

Login before adding your answer.

Traffic: 2620 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6