[samtools depth] Calculate percent of the genome covered
1
1
Entering edit mode
9.9 years ago
madkitty ▴ 690

I received a BAM of one chromosome and need to determine the percent of the genome covered. Our genome size is 16,600 bp and we did paired-end sequencing each reads being 150 bp long. To my knowledge it should be calculated as follow:

  • Genome size : 16,670 bp
  • No of base covered: 16,639 bp
  • Percent of the genome covered = [(No. of base covered) / (Genome size) ] 100 = (16,639/16,670) 100 = 99.81%

I did the following with samtools depth:

samtools depth -q0 -Q0 my.bam > qry.depth
wc -l qry.depth
16639 qry.depth

Though I don't fully understand the output of samtools depth. To my knowledge the output should be as follow:

  • Column 1: ID
  • Column 2: Genomic position
  • Column 3: No of reads covering that position
  • Number of line in qry.depth represents the number of bases covered in the reference

Here is a sample of my qry.depth file:

chrM 1 3
chrM 2 6
chrM 3 16
chrM 4 32
chrM 5 34
chrM 7783 0
chrM 7915 0

The line count of qry.depth is 16,639 but I still have 2 positions with no reads covering them, then what is the percent of the genome covered? Maybe I misunderstood something in the output file format.

samtools bam genome coverage • 14k views
ADD COMMENT
0
Entering edit mode
9.9 years ago

It'll output a 0 if it would have otherwise produced a different number, had you not specified -q. Yes, it would be nice if this were documented.

ADD COMMENT
0
Entering edit mode

Hi Devon,

Sorry. I cannot understand completely. I also encounter the question:

The reference length: 134962

Line count of *.depth is : 134184

No. of positions have only 0: 26, as below:

Triodia_melvillei    67472    0    0    0
Triodia_melvillei    67473    0    0    0
Triodia_melvillei    67474    0    0    0
Triodia_melvillei    67475    0    0    0
Triodia_melvillei    67476    0    0    0
Triodia_melvillei    110232    0    0    0
Triodia_melvillei    110233    0    0    0
Triodia_melvillei    110234    0    0    0
Triodia_melvillei    110235    0    0    0
Triodia_melvillei    110236    0    0    0
Triodia_melvillei    110237    0    0    0
Triodia_melvillei    110238    0    0    0
Triodia_melvillei    110239    0    0    0

My question is: why there are 0? If there are reads to support these positions, why samtools output them?

Thanks!

ADD REPLY
0
Entering edit mode

Normally -a -a will output absolutely every position, or at least it should. If there are reads supporting coverage of a positions then they should get counted unless they're above the maximus depth (-d) or being filtered out (play around with -q and -Q).

ADD REPLY

Login before adding your answer.

Traffic: 2640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6