I have a microarray gene expression data from GEO and I would like to find how many samples has high expression of Gene A.
This is the method that I am following currently
1) Download raw cel file 2) Use rma package to normalize 3) Perform clustering using heatmap based on Gene A 4) By looking at cluster from heatmap, I pick the samples that has highly expressed, low expressed, intermediate expressed of gene A.
1) Is there any other method to identify % of samples that has highly expressed Gene A in the given dataset?
2) How to answer the same question using datasets from multiple studies?
Example: If I have 10 studies from GEO, and all are microarray data,
a) should I perform my clustering individually on each dataset and find % of samples from each study and take an average or?
b) Should I merge dataset using Combat or Limma and then perform clustering to find % of samples that has gene A that are highly expressed?