I am working with a cohort consisting of Illumina HumanMethylation 450K and EPIC arrays. I m using the ChAMP pipeline to load the data, because of the extensive probe filtration step. To merge the different arrays I use the minfi package. In a first step, I convert the beta-matrix from ChAMP to a GenomicRatioSet and create a virtual array of a given type.
For casting a 450K virtual array:
ratioSet450K <- makeGenomicRatioSetFromMatrix(myLoad_450K$beta,
array = "IlluminaHumanMethylation450k",
annotation = "ilmn12.hg19",
what = "Beta")
ratioSetEPIC <- makeGenomicRatioSetFromMatrix(myLoad_EPIC$beta,
array = "IlluminaHumanMethylationEPIC",
annotation = "ilm10b4.hg19",
what = "Beta")
# conservative merging
ratioSetMerged450K <- combineArrays(ratioSet450K,
ratioSetEPIC,
outType = "IlluminaHumanMethylation450k",
verbose = T)
However, when I use outType = "IlluminaHumanMethylationEPIC" as indicated in the manual I get the same virtual array as above. Hence, the function takes the intersect of my cg-probes. But the manual explicitly states:
This function combines data from the two different array types and outputs a data object of the user-specified type. Essentially, this new object will be like (for example) an EPIC array with many probes missing.
I assumed the missing probes would be treated as NAs. Has anybody a solution for this? For the moment I am using dplyr's left_join to circumvent the problem, but it is not a very elegant workaround.