Most commonly used biClustering algorithm?
2
2
Entering edit mode
7.4 years ago
ammy22222222 ▴ 30

A group of colleagues and I are planning on working on a project involving the optimization/upgrade of current most used biclustering algorithms and see whether we could add any features that seem important or needed. My questions are; What are the most common and/or best biclustering algorithms available? Are there any features that you feel are needed or would output more accurate results? How often do you find you are in need to cluster a biological dataset? We are mostly new to the bioinformatics discipline and so help from anyone with experience regarding the matter would be very much appreciated.

gene next-gen biclustering • 2.3k views
ADD COMMENT
1
Entering edit mode
7.4 years ago
Diwan ▴ 650

Iterative signature algorithm (ISA) is one of the earliest algorithm and widely used. It was available in bioconductor (‘eisa’ package) early on. But now there are many biclustering algorithms available. For example, bimax, plaid, SAMBA etc. You can check this link: https://www.bioconductor.org/packages/release/BiocViews.html#___StatisticalMethod

and look up “bicluster” to see the other biclustering algorithms available in bioconductor. Some other packages such as ‘Biclust’ is available as R package. The nice feature of Biclust is that it we can run five different biclustering algorithms and compare.

I think, with the availability of too many biclustering algorithm, comparing and choosing a relevant one is a problem. Visualization of resulting biclusters and extracting meaningful information out of them seems to be an active area research in this field.

One advantage of biclustering is that same genes/conditions can be present in multiple clusters. It also gives a problem that we get many overlapping clusters. Then we have to choose whether it is meaningful to treat these clusters separately or to merge them. You can check this paper which does a comparison of available biclusters tools:

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0090801

This paper shows a nice application of biclustering, i.e, finding new disease-relevant genes by mining large gene expression data.

https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-3143-y

Making a new tool available in bioconductor and publishing a clear biological application will lead to rapid usage by the community. ISA algorithm did that with their early papers: http://bioinformatics.oxfordjournals.org/content/20/13/1993.long

HTH

ADD COMMENT
1
Entering edit mode

thanks! Didn't expect an answer that clear and informative!

ADD REPLY
0
Entering edit mode
7.4 years ago

I would guess that the most common would be probably be complete hierarchical clustering with a euclidean distance matrix purely for the reason that that is the default for the hclust function in R that is used by things like heatmap.2.

ADD COMMENT

Login before adding your answer.

Traffic: 2211 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6