Handling the large amount of missing value for measuring the similarity between genes in R
1
0
Entering edit mode
7.8 years ago

I used Yeast microarray dataset from SGD. Sample dataset are in this link. But the dataset contains huge number of missing values. I need to compute similarity between genes. If i removed all the gene rows which contains NA values in their sample's column, the number of genes decrease into half of the total number of genes.

How can I handling the large amount of missing value in the path of measuring the similarity between genes in R? What will be the standard approach for it?

R gene yeast Similarity-Matrix Missing-Values • 2.3k views
ADD COMMENT
0
Entering edit mode
7.8 years ago

What kind of data are you dealing with ?
There are two main approaches for dealing with missing values: one is imputation, i.e. you replace the missing value by some estimate of what it should be, the other is data integration i.e. you combine your data set with some other data e.g. you could compensate for missing links in a protein interaction graph by combining it with a genetic interaction graph.
Which way you go depends on the type of data you have and on the question you're trying to address.

ADD COMMENT
0
Entering edit mode

I uesd yeast microarray dataset compiled from a variety of expression experiments that provide expression profiles for yeast carrying out a variety of cellular programs and responding to a variety of applied stimuli. Sample dataset are in this link

ADD REPLY
0
Entering edit mode

If you don't want to use complementary data then you need to do imputation or ignore the missing values. You may find this review useful. Another approach could be to use a downstream analysis method that can deal with missing values.

ADD REPLY

Login before adding your answer.

Traffic: 2967 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6