Annotation of Methylation data in GDC Portal
1
1
Entering edit mode
5.4 years ago

I was specifically looking at rowData of the SummarizedExperiment object that is obtained after downloading from TCGABiolinks package. So in this data.frame, a particular column called 'Feature type' exists. It contains information about S_Shore, N_Shore, CGI, N_Shelf, S_Shelf. However I also see a lot of "." (dots) in this column. Does it imply that belong to Open Sea, since all other categories exist or they care unknown?

Illumina 450K GDC TCGA DNA Methylation • 1.3k views
ADD COMMENT
2
Entering edit mode
5.4 years ago

They appear to be sites that fall outside of the following classification:

The position of the CpG site in reference to the island:

 - Island
 - N_Shore or S_Shore (0-2 kb upstream or downstream from CGI)
 - N_Shelf or S_Shelf (2-4 kbp upstream or downstream from CGI)

So, if the site is >4kbp from the island, it will be labeled with ".".

[source: https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Methylation_LO_Pipeline/]

ADD COMMENT
0
Entering edit mode

Thanks for the reply. I knew this but often in papers I come across Open Seas so thats why this question.

ADD REPLY
0
Entering edit mode

I guess that you could call all of those as 'open seas'. I have not seen this definition used widely, but noted it in this publication: Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome.

Provided you also clearly define it in your methods, I would not necessarily see any major issue with it.

ADD REPLY

Login before adding your answer.

Traffic: 2849 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6