Fast way to extract hg19 sequences with Biopython?
0
0
Entering edit mode
7.8 years ago
nchuang ▴ 260

I am trying to extract a list of 5kb sequences from hg19 genome. However it takes a very long time to Bio.SeqIO.parse() all of the genome into memory, and even using Bio.SeqIO.index() also takes a long time as well.

What is a fast way to do this or this is a limitation of python?

I'm waiting for my admin to install pyfaidx for me and I will see how that one does too.

python biopython • 2.5k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
0
Entering edit mode

wow that looks so simple. I will try to subprocess it. Thanks!

ADD REPLY
1
Entering edit mode

Try pip install --user pyfaidx. Then you should have the faidx script in $HOME/.local/bin.

ADD REPLY
0
Entering edit mode

I don't have root and pip is not even installed on the default version 2.4. They did install 3.4 for me but I had to add it to path. I don't think I have pip for that install either.

ADD REPLY

Login before adding your answer.

Traffic: 1943 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6