Question

Dealing with missing values: By looking at Density distribution curves can I decide best data imputation method for missing omics data?

0

Entering edit mode

4.8 years ago

WUSCHEL ▴ 760

By looking at Density distribution curves can I decide best imputation method for omics data?

# All possible imputation methods are printed in an error, if an invalid function name is given.
impute(data_norm, fun = "")
## Error in match.arg(fun): 'arg' should be one of "bpca", "knn", "QRILC", "MLE", "MinDet", "MinProb", "man", "min", "zero", "mixed", "nbavg"
# Impute missing data using random draws from a Gaussian distribution centered around a minimal value (for MNAR)
data_imp <- impute(data_norm, fun = "MinProb", q = 0.01)

# Impute missing data using random draws from a manually defined left-shifted Gaussian distribution (for MNAR)
data_imp_man <- impute(data_norm, fun = "man", shift = 1.8, scale = 0.3)

# Impute missing data using the k-nearest neighbour approach (for MAR)
data_imp_knn <- impute(data_norm, fun = "knn", rowmax = 0.9)
The effect of the imputation on the distributions can be visualized.

# Plot intensity distributions before and after imputation
plot_imputation(data_norm, data_imp)

What are the parameters I should look at the decide best imputation method when working with several genotypes?

RNA-Seq gene proteomics R • 1.5k views

ADD COMMENT • link 4.8 years ago by WUSCHEL ▴ 760

1

Entering edit mode

What's the data? There are established procedures for dealing with missing data for some data types. Read the literature related to your data to look for commonly used imputation methods.

ADD REPLY • link 4.8 years ago by Jean-Karim Heriche 27k