Pysam cannot find index
0
1
Entering edit mode
8.7 years ago
eric.kern13 ▴ 240

Hi Biostars,

This is my first question on the site, so thanks for all the help I've gotten from lurking, and thanks for the help I'm about to get. I'll try to share my research and follow all the etiquette. Can anyone help me with the following situation?

I'm working with the python packages Pysam and Pysamstats. I am using BAM and FASTA files from the 1000genomes project. I would like to provide a 50 base-pair interval on chromosome 22 in HG18 coordinates and in turn receive the average depth of coverage from that interval. My attempts so far have opened the files first, then fed them into pysamstats.stat_coverage_binned, like this.

fasta_file = pysam.Fastafile(<fasta_ftp_path>)
bamfile = pysam.AlignmentFile(<alignment_ftp_path>)
pysamstats.stat_coverage_binned(bamfile, fasta_file, <chrom>, <start_pos>, <end_pos>).

From the first line, I encounter this:

ValueError: could not locate index file

The same error appears on the third line if I try to feed in the ftp addresses rather than file objects. My understanding is that these packages look for index files by appending ".bai" or ".fai" to the filename I feed them, which should work (you can look at the files available further down my post). This is supported when I RTFS at line 142 of this file, which has my error message. I can't quite read cython, but given "filename.fa" I think it says to look for "filename.fa.fai". Here's TFS: https://github.com/pysam-developers/pysam/blob/master/pysam/cfaidx.pyx

FASTA-associated files: ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/GRCh38_reference_genome

BAM-associated files: ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/other_exome_alignments/HG00096/exome_alignment/

A minimal example to generate the error (sorry about the formatting; can't figure out how this will render):

import os
import pysam
import pysamstats
alignment_ftp_path = "ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/" \
                     "other_exome_alignments/HG00096/exome_alignment/"
alignment_filename = "HG00096.mapped.illumina.mosaik.GBR.exome.20111114.bam"
alignment_ftp_path = os.path.join(alignment_ftp_path, alignment_filename)
fasta_ftp_path = "ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/" \
                 "GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa"

fasta_file = pysam.Fastafile(fasta_ftp_path)
bamfile2 = pysam.AlignmentFile(filename = alignment_ftp_path)
ex_bin_read_depth = pysamstats.stat_coverage_binned(bamfile2, fasta_ftp_path,\
                                                    chrom="22",              \
                                                    start=14481425,          \
                                                    end=14481475)
Pysam samtools • 5.5k views
ADD COMMENT
1
Entering edit mode

I think the problem stems from the fact that you're trying to access FTP links like local files. Try downloading the data first.

If space is an issue, you could try linking the FTP data to a named pipe, though that's a little more tricky.

ADD REPLY
0
Entering edit mode

That worked; thanks. Is there an "accept" button like on SE?

I notice that os.path.isfile(<address>) is False even for correct ftp addresses.

ADD REPLY
0
Entering edit mode

os.path.isfile returns True for files that exist on your current filesystem. It does not work for remote URLs.

ADD REPLY
0
Entering edit mode

There is an accept button on Biostars, but only for actual answers. I was guessing at the problem so I posted a comment instead. :)

ADD REPLY
0
Entering edit mode

I think you found a bug in pysam. I reported it here. You'll probably have to download the FASTA file for now, as Dan D says. The remote BAM file should work, though.

ADD REPLY
0
Entering edit mode

Indeed, the remote BAM works. Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 2699 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6