DMRcate results: How to find out which CpG sites constitute each DMR
7
2
Entering edit mode
6.7 years ago
c.ryder3 ▴ 40

Hello. The output of DMRcate is the genomic coordinates of regions identified as differentially methylated. DMRcate also tells how many CpG sites are in each DMR. The output looks something like this:

        coord                   no.cpgs    minfdr        Stouffer maxbetafc meanbetafc
39721    chr7:96641456-96657023    86   1.786098e-195        0   0.5638876  0.2507447
11267 chr12:115130855-115136308    71   0.000000e+00         0   0.5665583  0.3240000
29891    chr3:62353312-62365402    62   5.477056e-142        0   0.5326552  0.3088480
30739  chr3:147122315-147131860    62   0.000000e+00         0   0.5841839  0.3162800
6859  chr10:134594987-134602530    60   0.000000e+00         0   0.6188469  0.3357113
41367    chr8:25897201-25909599    57   3.400620e-184        0   0.5738376  0.3226581

The coord column gives the chromosomal coordinates of the differentially methylated region and the no.cpgs column gives the number of CpG sites that constitute this region. I would like to know the ID (e.g. cg08899471) of the CpG sites within these regions. How can I get this information?

Thank you

R Bioconductor DMRcate Methylation • 5.8k views
ADD COMMENT
0
Entering edit mode

Can you elaborate on what you mean by identity? You can identify genes in the vicinity or overlap with a CpG island etc.

ADD REPLY
0
Entering edit mode

Hello. I have updated my question with some more details.

ADD REPLY
0
Entering edit mode

I have the same problem right now, did you ever find a solution?

Thanks, Alex

ADD REPLY
1
Entering edit mode
4.7 years ago

Hi all, I know this is a late reply but i had the same issue so i wrote some code to help anyone who needs it in the future. It might be a bit crude but it works well!

First take your final DMRcate output, mine was named "results.ranges" which puts the genomic ranges over the 'DMRcoutput', and turn it into a data frame, and give each DMR an identifier

RR <-as.data.frame (results.ranges)
RR$DMRID <-rownames(RR)
row.names(RR) = NULL
RR$DMRNO <- rownames (RR)
row.names(RR) = RR$DMRID
RR$DMRID = NULL
RR <-RR[order(RR$minfdr), , drop = FALSE
View(RR)

Your new results.ranges (i.e. RR) should now have a DMR number associated with each DMR

Now you need to pull the CpG info from the dmr output that was used to make the results ranges

cgID <- as.data.frame(dmrcoutput$input)

Look at your RR file, choose a DMR that looks good, and take note of the DMRNO; Run

DMRNUM <- readline(prompt = "What is your DMR Number:"

Enter the number into the console and hit enter, then run these lines and it should spit out a table listing the probes as well as other useful info

assign(paste0("DMR_",DMRNUM), subset(subset(RR,DMRNO==DMRNUM)))
assign(paste0("DMR_",DMRNUM,"_probelist"), subset(cgID, cgID$CHR==assign(paste0("DMR_",DMRNUM), subset(subset(RR,DMRNO==DMRNUM)))$seqnames & cgID$pos>assign(paste0("DMR_",DMRNUM), subset(subset(RR,DMRNO==DMRNUM)))$start-1 & cgID$pos<assign(paste0("DMR_",DMRNUM), subset(subset(RR,DMRNO==DMRNUM)))$end+1))

Hopefully you should now have a dataframe labelled "DMR_XXX_probelist"!

Kind Regards, Pete

ADD COMMENT
0
Entering edit mode
6.7 years ago
halo22 ▴ 300

I am not sure if there is a tool that can help you retrieve the CpG info from the DMRcate or if there is a way to tweak DMRcate into giving you the info. But this is how I would do it:

Consider the co-ordinates "chr7:96641456-96657023 " and see how many CpG actually lie in this area, may though UCSC. Take this list and overlap with your list of CpG's that you used as an input(with B or M-values) to DMRcate.

ADD COMMENT
0
Entering edit mode
5.8 years ago
574233829 • 0

Hello.Have you solved the problem? I met the similar questions.I want to find the gene associated coordinates,but I don't know how to do it. Can you help me? I want to know which gene associated with the coordinate (eg.chr7:96641456-96657023).

ADD COMMENT
0
Entering edit mode
5.8 years ago
574233829 • 0

Hello, I have used package called DMRcate to analyse 450k data. I want to find the gene which associates the DMRs.And I met some questions.The outputs include gene_assoc, group, hg19coord, no.probes, minpval, meanpval and maxbetafc,when I used the old version to analyse 450k data. But when I update the package,the outputs became coord, no.cpgs, minfdr, Stouffer, maxbetafc,meanbetafc.There is no result of "gene_assoc".I want to find the gene names associating "coord",can you help me ? Can you please tel me how to associate the gene by using the newest DMRcate packages.

There follow the output results of the newest DMRcate. coord no.cpgs minfdr Stouffer maxbetafc meanbetafc 63999 chr6:33156164-33181870 265 0 0 -0.5008031 -0.02648790 63997 chr6:33128825-33149777 150 0 0 0.4176126 0.08611966 63917 chr6:32144195-32161004 128 0 0 -0.2574513 -0.03184096 63914 chr6:32114490-32123701 124 0 0 -0.4377015 -0.06195576 63889 chr6:31935801-31940855 101 0 0 -0.1555205 -0.02401999 12564 chr11:31817810-31841980 100 0 0 -0.4611059 -0.17113506

ADD COMMENT
0
Entering edit mode
4.5 years ago
123456789 • 0

hey pete,

Thank for your reply. But seems like the last assign function doesn't work. Can you please check?

ADD COMMENT
0
Entering edit mode
2.5 years ago
almsu798 • 0

Dear Pete,

Are you able to give a screenshot for the header of the dataframe dmrcoutput$input)? and double check the last assign codes.

Much appreciated and kind regards, Suzan

ADD COMMENT

Login before adding your answer.

Traffic: 2574 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6