Hello. I want to know what percentage of the genome ( assembly hg19) is covered by the intervals in a bed file. is there a way you know? I saw some functions in bedtools but they dont do it directly. Thanks!
Hello. I want to know what percentage of the genome ( assembly hg19) is covered by the intervals in a bed file. is there a way you know? I saw some functions in bedtools but they dont do it directly. Thanks!
First, generate intervals for hg19
(perhaps stripping out non-nuclear and mitochondrial chromosomes):
$ fetchChromSizes hg19 | awk -v FS="\t" -v OFS="\t" '{ print $1, "0", $2; }' | grep -v "[_*_|MT]" | sort-bed - > hg19.nuc.bed
To calculate coverage:
$ bedmap --skip-unmapped --delim '\t' --echo --bases-uniq --echo-ref-size --bases-uniq-f hg19.nuc.bed <(sort-bed peaks.bed) > coverage.bed
The last three columns of coverage.bed
are per-chromosome calculations:
The order of the specified bedmap
operands --bases-uniq
, --echo-ref-size
, and --bases-uniq-f
write column values in that same order.
If you want a whole-genome percentage, you could sum the number of bases, sum the chromosome lengths, and divide the two sums at the end:
$ bedmap --skip-unmapped --delim '\t' --echo --bases-uniq --echo-ref-size hg19.nuc.bed <(sort-bed peaks.bed) | awk -v FS="\t" -v OFS="\t" '{ b += $(NF-1); l += $NF; } END { print b/l; }'
Using a toolkit like bedops lets you do this calculation more easily with more complex inputs. For example, you might want GC content over all of the mappable genome, i.e. everything except for blacklisted regions. This is a bit more difficult without a set-operations based approach.
A solution with awk
and the bc
(basic calculator):
#/ Get chromsizes hg19:
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.chrom.sizes
#/ Dummy BED file:
$ cat test.bed
chr1 100 100000
chr2 20 2000000
#/ Calculate percent of genome represented by test.bed:
bc <<< "scale=10; 100 * $(awk '{sum=$3-$2}END{print sum}' test.bed) / $(awk '{sum+=$2}END{print sum}' hg19.chrom.sizes)"
Answer here would be .0637512652%
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thank you , I will try it this way too!