Clustering Genes Based On Function
1
0
Entering edit mode
10.1 years ago

hello,

We would like to use either hierarchical or k means clustering, to cluster the genes in our dataset based on their function. We got the GO id for each gene and now we would like to cluster them in groups based on the function preferably hierarchical. That means from the bottom (where each function is unique) to upper levels (where we have more generalized/groups of functions).

Thanks in advance for your help!

gene function • 3.3k views
ADD COMMENT
0
Entering edit mode

You might want to check the R/BioC packages: GOsemsim, csbl.go They use semantic similarity measures to do GO-term based clustering.

ADD REPLY
0
Entering edit mode

I think you need to read about and understand the theory behind clustering approaches such as k-means. The "means" part indicates that you need some numbers (quantitative measurements).

ADD REPLY
0
Entering edit mode

I agree, they could cluster all of the genes linking to a given GO term (some homology measure), but that is within a GO term.

ADD REPLY
1
Entering edit mode
10.1 years ago
pld 5.1k

A quick note, most genes will have multiple GO terms mapped to them. Also, GO already exists as a hierarchical structure, that is how it was designed. So the thing to do would be to visualize the structure of terms enriched in your data and then build your gene clustering off of that tree.

However, unless you prune down the annotations for a given gene to a single GO term, you will have a weird looking clustering since a given gene will end up in multiple clusters. I am not sure that I see an advantage of pruning down annotations, you'll end up with really biased data.

http://cbl-gorilla.cs.technion.ac.il/

ADD COMMENT
0
Entering edit mode

thanks for the information!

ADD REPLY

Login before adding your answer.

Traffic: 2090 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6