Read depths from 1000G exome sequencing are almost all zero.
0
0
Entering edit mode
8.7 years ago
eric.kern13 ▴ 240

Hi Biostars community,

I recently built a script around the commands from this question in order to obtain read depth data for about 22,000 small regions (<=50bp each). The BAM file was from whole-exome NGS at the 1000 Genomes Project:

ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/other_exome_alignments/HG00096/exome_alignment/HG00096.mapped.illumina.mosaik.GBR.exome.20111114.bam

To choose my 22,000ish regions, I used 1000 Genomes' list of pulldown targets, and I used liftover to convert them to hg18 coordinates because that's what they align reads to in this paper (see supplement). I looked only at chromosome 22. For any region larger than 50 bp, I split it up.

My question: why are there only 878 regions with nonzero read depth? Is that to be expected?

Thanks for your advice.

next-gen-sequencing • 2.1k views
ADD COMMENT
2
Entering edit mode

The paper you are citing is the "pilot" paper written six years ago. The more recent the phase1 paper. 1000g is using hg19/GRCh37 throughout right now.

ADD REPLY
0
Entering edit mode

Phase 3 paper is coming out. The data is up on the FTP though.

ADD REPLY
0
Entering edit mode

I switched to HG19 coordinates, but the problem persists. It would help to have a fast, well-tested method to obtain read-depth from BAM files. Currently, it takes me hours to process a single chromosome, and both of the existing APIs that I tried (Pysam and Pybedtools) either crashed my Python kernel or crashed my entire computer.

ADD REPLY
0
Entering edit mode

No need to use pysam/pybedtools. Just: samtools depth -b interval.bed -r 22 input.bam.

ADD REPLY
0
Entering edit mode

In fact, switching to HG19 coordinates has resolved the problem. And thank you for the advice on samtools depth. I will try that.

ADD REPLY

Login before adding your answer.

Traffic: 2383 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6