Question

Possible methodology-R package for simulating a microarray dataset with both gene and clinical continuous features

0

Entering edit mode

7.4 years ago

svlachavas ▴ 790

Dear Community,

through R and based on a microarray gene expression dataset (60 samples in total-30 cancer and 30 control samples) and R package caret, i have performed a feature selection regarding a binary categorical outcome (Disease status). My final selected subset, is comprised of both gene features as also clinical continuous variables (the initial dataset, was produced by merging and batch effect corrected 2 affymetrix microarray datasets with similar phenotype condition, and also paired-each patient has both cancer and control samples).

Moreover, except from a simple initial inspection of my combined composite feature set with cross-validation, i would like also somehow to perform an initial validation in an independent dataset, in the way of testing the classifier trained in my initial dataset with these features. The major problem of simply selecting a microarray dataset from GEO and/or other repositories, is that these PET features, have been only measured in the same patients that also the microarrays have been produced (an important novelty that i would like somehow to test).So, i could not have any external samples or datasets with these clinical features.

Thus, there any package or methodology that i could implement in R, in order to perform a possible simulation of my above dataset with only these 41 features, and then utilize this "synthetic" dataset for external validation with the classifier constructed in my initial training/analyzed dataset ?

R microarray simulation classification • 1.7k views

ADD COMMENT • link updated 7.3 years ago by Biostar 20 • written 7.4 years ago by svlachavas ▴ 790