How To Differentiate Files With One Record From Files With Multiple Records?
1
0
Entering edit mode
10.3 years ago

I'm working with biopython, python, and gtk to create a program to load files of bioinformatic interest.

These files have multiple sequence in them

http://biopython.org/DIST/docs/tutorial/examples/ls_orchid.gbk

http://biopython.org/DIST/docs/tutorial/examples/ls_orchid.fasta

but this ones only have one (long) sequence.

http://biopython.org/SRC/biopython/Tests/GenBank/NC_005816.gb

http://biopython.org/SRC/biopython/Tests/GenBank/NC_005816.fna

Is there any way to know this before processing the file? How to differentiate the ones with one sequence from others with multiple sequences? I want to know when to use exactly Bio.SeqIO.read() or Bio.SeqIO.parse()

Thanks for your time, I tried to search for answers, but I didn't find something similar to this.

fasta python genbank biopython • 2.7k views
ADD COMMENT
2
Entering edit mode

is there any way to know this before processing the file?

You'd have to process it somehow to determine whether the file contains one or multiple sequences. Given this, consider using Bio.SeqIO.parse(), since it handles both cases.

ADD REPLY
2
Entering edit mode
10.3 years ago
Peter 6.0k

If you don't know how many records there are, assume at least one, and use Bio.SeqIO.parse() with a for loop. If the file happens to have only one record, your code will just do the for loop once. Easy :)

ADD COMMENT
0
Entering edit mode

Thanks, i'm testing the loading times for different files, just wanna go with the most optimized code.

ADD REPLY
0
Entering edit mode

Well internally Bio.SeqIO.read() calls Bio.SeqIO.parse() anyway, and checks there was exactly one record.

ADD REPLY

Login before adding your answer.

Traffic: 2564 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6