Whole-genome percentile values in bigWig files
1
2
Entering edit mode
4.4 years ago
rachel ▴ 30

Hi all,

I'm looking to compute the k-th percentile value of some bigWig files (ChIP-seq) across the whole genome. For example, I'd like to know the 99.99th percentile value of a bigWig file across the whole genome.

I'm aware of wiggletools histogram (unfortunately does not provide exact percentile values, which I need) and the BEDOPS answer suggested in this question. However, the BEDOPS answer only computes percentile values on a per-chromosome basis, whereas I need whole-genome percentile values. I have tried to modify it to compute percentile values on a whole-genome basis to no avail.

If anyone has any ideas, I would appreciate them!

Thanks,

Rachel

ChIP-Seq bigwig wiggle statistics • 1.1k views
ADD COMMENT
2
Entering edit mode
4.4 years ago

Unless I'm missing something about the question, here are a couple possibilities:

Perhaps use a pseudochromosome name for all records:

$ bigWigToBedGraph signal.bw signal.bg
$ awk -vFS="\t" -vOFS="\t" '{ $5 = $4; $4 = $1; $1 = "chrPseudo"; print $0; }' signal.bg | sort-bed - > signal.pseudoChr.bed
$ echo -e "chrPseudo\t0\t248956422" > chrPseudo.extents.bed
$ bedmap --echo --kth 0.9999 chrPseudo.extents.bed signal.pseudoChr.bed > answer.bed

I'm assuming that the ChIP-seq input comes from hg38, which is why I create an "extents" file that extends over the longest chromosome in the real hg38 assembly, i.e., chr1. Whatever intervals come out of that awk statement, they will never extend past the longest, real chromosome. If you're not using hg38, you could change 248956422 to match the longest chromosome in your assembly.

That's a pretty convoluted approach.

Another option might be to just strip all the signal from signal.bg into its own text file, and run that file through R.

$ bigWigToBedGraph signal.bw signal.bg
$ cut -f4 signal.bg > signal.txt
$ R
> d <- read.table("signal.txt")
> quantile(d$V1, c(0.9999))
...some answer...

Running ?quantile in R should give more detail about that command.

Even though I'd love for everyone to use BEDOPS, using R or even Python numpy.quantile would probably be faster than rejigging a bigWig into a resorted BED file with a fake chromosome name.

Hope this helps!

ADD COMMENT
1
Entering edit mode

I just wanted to say that your first option worked beautifully! (I already had everything set up to use bedops so it was easy to adapt to). Thank you so much for responding to my question :)

ADD REPLY

Login before adding your answer.

Traffic: 2725 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6