Dealing with missing values: By looking at Density distribution curves can I decide best data imputation method for missing omics data?
0
0
Entering edit mode
4.8 years ago
WUSCHEL ▴ 760

By looking at Density distribution curves can I decide best imputation method for omics data?

# All possible imputation methods are printed in an error, if an invalid function name is given.
impute(data_norm, fun = "")
## Error in match.arg(fun): 'arg' should be one of "bpca", "knn", "QRILC", "MLE", "MinDet", "MinProb", "man", "min", "zero", "mixed", "nbavg"
# Impute missing data using random draws from a Gaussian distribution centered around a minimal value (for MNAR)
data_imp <- impute(data_norm, fun = "MinProb", q = 0.01)

# Impute missing data using random draws from a manually defined left-shifted Gaussian distribution (for MNAR)
data_imp_man <- impute(data_norm, fun = "man", shift = 1.8, scale = 0.3)

# Impute missing data using the k-nearest neighbour approach (for MAR)
data_imp_knn <- impute(data_norm, fun = "knn", rowmax = 0.9)
The effect of the imputation on the distributions can be visualized.

# Plot intensity distributions before and after imputation
plot_imputation(data_norm, data_imp)

Capture

What are the parameters I should look at the decide best imputation method when working with several genotypes?

RNA-Seq gene proteomics R • 1.5k views
ADD COMMENT
1
Entering edit mode

What's the data? There are established procedures for dealing with missing data for some data types. Read the literature related to your data to look for commonly used imputation methods.

ADD REPLY

Login before adding your answer.

Traffic: 1397 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6