Biostar Beta. Not for public use.
Converting CpG methylation calls to average genome coverage
1
Entering edit mode
15 months ago
biostart • 290
Germany

Hello,

So I processed Bisulphite data and have a BED file with coordinates of all CpGs and their methylation level. Now I want to convert it to average coverage (say, using 100 bp as a sliding window). Is there a simple tool that is doing this? (For example, I see on the HOMER web site that it might allow BS-Seq/methyC-Seq analysis, but this feature is not documented, so looking for something like this).

Thanks!

ADD COMMENTlink
1
Entering edit mode

If you only have CpG methylation as a percentage or fraction then you're not going to be able to reconstruct coverage. Can you give us some idea of what your data look like?

ADD REPLYlink
1
Entering edit mode

I'm not sure windowing and WGBS really go together. A sliding average only makes sense when the signal has an equal chance of appearing anywhere on the continuous scale your averaging. For example, lets say I lived in a country where fireworks could only be bought on Christmas-Eve (24th) and New-years-eve (31st). If i did a sliding-window of firework sales over the year, the 27th and 28th would have the highest sales of fireworks for any day of the year -- but you can't actually buy fireworks on those days. My average no longer reflects the underlying data, and in fact misleads me.

In short, I think you will need to really pin down exactly how you want to do your averaging to smoothen out the signal - and it will probably not be simple at all. As Matt says, you'll probably need more data, and Alex gives a really nice breakdown of how you can use that extra data here: C: Digitize methylation data throughout hg19

ADD REPLYlink
1
Entering edit mode

The example with fireworks is kind of OK, but if I am looking at the differences between two experimental conditions and just want to see the big picture, the sliding window approach might be still appropriate. I mean, the underlying sequence is the same in both cell conditions, so the differences in methylation will be still reflected after the sliding window averaging.

ADD REPLYlink
0
Entering edit mode

My argument against averaging WGBS is that CpGs are not randomly distributed, which leads to two issues:

  1. Bins with very few CpGs in them will tend to have extreme hyper/hypo-methylated averages, because that is the nature of a single CpG (most are either fully methylated or fully unmethylated), where as bins with a large number of CpGs will tend to average out to whatever the average is - perhaps 50% methylation for the bin, which is very uncommon at the individual CpG level.

  2. Your data becomes very difficult to interpret, since a 1000bp bin containing 1x 70% methylated CpG will be indistingishable from a 1000bp bin containing 700x 100% methylated, and 300x 0% methylated.

This is not an issue for other 'omic' data where every base in the genome has an equal chance of generating some kind of signal for whatever the assay is, which is what makes WGBS data unique.

I see your argument for averaging WGBS, in that any bias introduced by averaging should be the same for all datasets (and while this is a fair assumption, its not 100% guaranteed as perhaps the cell is not changing the amount of methylation in a CpG bin, but rather then ability of CpGs in that bin to become methylated/demethylated, resulting in two samples from the same cell type having different potential-methylation sites) -- however, assuming that the potential CpG sites between samples are exactly the same, you still leave yourself open to not seeing big differences because they're hidden behind the average. You are also liable to overstate differences in bins where the underlying data is sparse. In short, binning is going to reduce your statistical power.

The only way binning helps is by reducing the computational workload, since a genome binned by 1000bp is 1000x smaller than a genome not binned at all. And who doesn't like a linear speed up... But be under no illusion, this is a lossy transformation that will not improve the analysis at all. Binning does not give you better data.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1