Biostar Beta. Not for public use.
Beta-values, M-values and thresholds on effect size
1
Entering edit mode
2.0 years ago
gbayon • 160
Spain

Hi everybody,

I am currently working on several projects using Illumina 450k DNA Methylation Microarrays. In order to detect Differentially Methylated Probes (DMP), I usually employ the Empirical Bayes-based method in the _limma_ package. Based on the paper by Pan Du et al, I stick with M-values to fit the statistical model, and usually employ beta-values just in graphs and reports that are going to be read by fellow biologists.

In order to retain only results with a certain biological relevance, we usually apply a threshold on effect size, keeping only the probes that are significant and with a big effect size. Nothing new or fancy to this point. However, it is not uncommon to argue with fellows at the lab about the benefits or drawbacks of M-values against beta-values when trying to set a coherent effect size threshold.

Mathematical properties of M-values let us set a fixed threshold on effect size, while this is not that easy for betas. However, setting a threshold on M-values differences (say 1.4, as stated in the previous paper) usually results in a set of probes where a lot of them seem to present very small differences in beta-values (say, for example, 0.01), specially those near the minimum and maximum. This is very counterintuitive for a biologist, who is going to argue against it based on the fact that such a small difference means nothing from a biological point of view.

My mental idea of what's happening there is related to the technical bias of the 450k. This is, I try to convince people that a small difference in that region is as credible as a bigger difference in the middle region (around 0.5 in beta), due to the array design, but I do not think if I am right, or if I can even see correctly what is going on.

What do you usually do in your pipelines? Use M-values for the fit, and beta-values differences as thresholds for effect sizes? A first threshold on M-values and a second filtering step based on betas? Everything with betas? Nothing at all?

Any hint will be much appreciated.

ADD COMMENTlink
0
Entering edit mode

Hi, What the different between 450 microarray and next generating sequence ?

ADD REPLYlink
0
Entering edit mode

HI, What the difference between Macroarry 450 and next generating sequence ?

ADD REPLYlink
1
Entering edit mode

Please take a few minutes to review this post: How To Ask Good Questions On Technical And Scientific Forums It will help you formulate a proper question with sufficient detail.

Those are two completely different technologies. You should be able to search the web with "microarray" and "next generation sequencing" to find enough information.

ADD REPLYlink
0
Entering edit mode

Hi all,

Can we use M-value with next generating sequence data? any help please.

Thanks

ADD REPLYlink
0
Entering edit mode

Yes, my answer from 4 years ago applies equally to WGBS and similar NGS experiments.

ADD REPLYlink
0
Entering edit mode

Thank you Devon Ryan

ADD REPLYlink
0
Entering edit mode

Do not add answers unless you're answering the top level question. This should be a reply to Devon's comment. Could you make the appropriate change please? That would involve the following steps:

  1. Copy the contents of your reply from this answer (you can edit this answer - Ctrl/Cmd + click the link to open it in a new tab and do a Select All -> Copy there).
  2. Click on "Add Reply" on Devon 's comment here: C: Beta-values, M-values and thresholds on effect size
  3. Paste the copied text
  4. Click on the green "Add Comment" button
  5. Click on moderate back in your answer here: A: Beta-values, M-values and thresholds on effect size
  6. Choose Delete Post
  7. Click on the blue Submit button.

Thank you!

ADD REPLYlink
2
Entering edit mode
11 months ago
Freiburg, Germany

Using M-values for statistics is by far the best way to go.

Regarding the utility of beta values, it's good to filter by those as well. Why? Because a 1% change in methylation is unlikely to be biologically meaningful. I've seen a number of papers publish such changes, but it's always interesting to note that they never bother to show functional relevance (probably because there is none). If you can show the biological relevance of such small changes then go ahead and follow up on them. Personally, I want to prioritize the results on the likelihood that it's causing some phenotype and using methylation changes for that (and only that) makes sense.

Edit: There 's corollary to other types of data. For example, we can often detect small changes in highly-expressed genes when we do RNAseq. Some of these are meaningful, but the really small changes aren't. I wouldn't toss these results, but I also wouldn't recommend anyone pursue them first when doing a follow-up.

ADD COMMENTlink
0
Entering edit mode

I agree with you. I think the architecture already has enough technical noise _per se_ , as to believe those changes with minimum beta differences really have any relevance at all. I was planning to combine two decisions: A) If a change is significant or not, using the adjusted p-values from testing on M-values and B) if the change has some sense, using the differences in beta values. Do you think an additional threshold on M-values difference would mean anything at all? Because I have seen that a 0.2 beta difference implies a minimum approximated difference in M of 1.17, around the 0.5 beta value. I think that could be coherent, but I just wanted to comment it.

ADD REPLYlink
0
Entering edit mode

Your approach seems pretty reasonable to me :)

Regarding using the M-values differences as a threshold, my guess is that this ends up being related to filtering by p-value (except that the p-value incorporates the reliability of the M-value difference as well). I doubt that'll hurt anything, but I wouldn't expect to gain too much. Having said that, it's been forever since I've had a dataset like this so if you have one that contradicts my guess then please ignore this :)

ADD REPLYlink
0
Entering edit mode

Thank you Devon Ryan.

ADD REPLYlink
0
Entering edit mode
12 months ago
WCIP | Glasgow | UK

I have been using the minfi package to process 450k arrays, including the differential methylation part. Following the package vignettes I used the M values to detect differential probes. (With two conditions, minfi applies an f-test to detect differential probes, so not too different from the limma package). For reporting differential methylation, I averaged beta values across the two conditions and reported the difference of the averages. That's how I've done it... I'm interested in comments & opinions myself...

ADD COMMENTlink
0
Entering edit mode

I am also a big fan of the minfi package. I think it is great. And yes, I have also done the same processing as you did. I think it is quite common. I opened this post to see if I could get some insights or advice, because sometimes it is easy to screw things, although I know this is a fairly simple question.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1