I have ~10 biological samples for which I have microarray expression data for ~20000 probes. However, I am interested in only biological function related to ~20 metabolic pathways (as defined in KEGG), so I selected only those genes (around ~1500) which are annotated with these ~20 pathways. I did a PCA using these ~1500 genes and it showed very good clustering of samples in PC1, PC2 and PC3. So I did a feature extraction and selected ~ 500 genes that contributed best to PC1 to PC3. When I plot the heatmap of gene expression of these ~500 genes, I get clear clustering of genes into 4 neat clusters, that fit the biological categories of my samples.
Now, I am interested to see whether each of these gene clusters is enriched in any of the ~20 KEGG pathways I had already selected. When I tried to do a GO enrichment or Pathway enrichment with softwares like Gprofiler, DAVID, Gorilla etc with input list as list of genes in the cluster, and background list as list of ~500 genes that I used to cluster, I get too broad results like "metabolism", which is not useful.
Can I look for enrichment of each cluster for each pathway one by one? For example, if cluster 1 has 100 genes, and 20 of them belong to Pathway A, and if out of the 500 genes I originally used for clustering there are 100 genes belonging to Pathway A, I can conclude there is no enrichment in cluster 1 for pathway-A. What would be the appropriate statistical test and multiple correction I will have to do? Can I do a simple fischer exact test or hypergeometric test to look for enrichment and do a multiple correction? Can you please point out resources to do this in R?
You should change the type of the post. People assume you have created a tool once they read your post title. Please convert "tool" into "Question".
Thanks for noticing this, I've adapted the post type.