How to cluster microarray samples based on euclidean distance and a complete linkage metric?
2
0
Entering edit mode
8.8 years ago
JacobS ▴ 980

I am trying to replicate some computational experiments I found in a paper. In this paper, the authors have ~30 genechip human genome u133 plus 2.0 arrays, 1 for each experimental sample and no references. They process the .CEL files into log2 RMA normalized signal intensity files. They then create a dendrogram that demonstrates that there are 2 main phenotypes within these 30 samples, based on gene expression.

I am trying to replicate their work, but I'm not sure how they went from the log2 RMA normalized signal intensity files to a clustered dendrogram. There explanation is, "Hierarchical clustering was performed using Euclidean distance and a complete linkage metric."

I've reached out to the authors, but this paper is nearly 5 years old, no I may not get a response. Does anybody know how this can be done?

microarray clustering DGE • 4.8k views
ADD COMMENT
4
Entering edit mode
8.8 years ago
5utr ▴ 370

A simple example in R:

First calculate the Euclidean distance with function dist()

eucl_dist=dist(matrix(c(rnorm(100),rnorm(100)),nrow = 2,ncol = 100),method = 'euclidean')

then perform hierarchical clustering with complete linkage method

hie_clust=hclust(eucl_dist,method = 'complete')
ADD COMMENT
0
Entering edit mode

Will try, thanks!

ADD REPLY
0
Entering edit mode

So I tried using these commands with my matrix. The full matrix tracks 56,000 genes, and R crashes, stating, Error: cannot allocate vector of size 544.4 Gb. I tried just using a subset of 100 genes, and the command executed, so I have a hie_clust object. However, when I plot this, I get a dendrogram that clusters the individual genes rather than the samples. How can I fix this? Also, is there a way to get a text list of the clustering rather than a plot? Thanks for your help, I'm not very good with R!

ADD REPLY
0
Entering edit mode

dist() compute the distance between the rows of your matrix so you can just transpose your_matrix using t(your_matrix)

hie_clust is an object with the clustering information if you type hie_clust$ you can access the ordering, the height etc.

You can perform different operations on the hclust object, like cutting it into a k number of clusters Example: cutree(hie_clust,k = 10)

ADD REPLY
0
Entering edit mode

Thanks for the reply Gian. I've tried transposing my matrix, but for some reason the terminal dendrogram branches still do not represent samples (there are far more of them than input samples)

ADD REPLY
0
Entering edit mode

Solved my program, I was accidentally calling as.matrix on a matrix. It works great now, thanks!

ADD REPLY
1
Entering edit mode
8.8 years ago
Irsan ★ 7.8k

Source this file: https://github.com/Irsan88/SeqTools/blob/master/RNA/Expression/countMatrixTools.R

Then do:

plot(dendrogramOnSamples(yourData, clustComplete,distEucledian))
ADD COMMENT
0
Entering edit mode

Thanks for your reply! I see this tool is expecting a counts matrix. Can I just provide the RMA normalized signal intensity scores in place of traditional RNA-Seq counts? And should they be log2 transformed?

ADD REPLY
0
Entering edit mode
It expects a matrix so it will work. It does the same as Gian's answer. Yes they should be log2 transformed and (RMA) normalized
ADD REPLY
0
Entering edit mode

Perfect, thanks! I'll give it a try and report back.

ADD REPLY
0
Entering edit mode

Ran into a snag... any thoughts?

> source("countMatrixTools.R", local=TRUE)
> myMatrix <- as.matrix(read.table("small_RMA_table.txt", header=TRUE, sep = "\t", row.names = 1, as.is=TRUE))
> plot(dendrogramSamples(myMatrix, clustComplete,distEucledian))
Error in if (is.na(n) || n > 65536L) stop("size cannot be NA nor exceed 65536") :
  missing value where TRUE/FALSE needed
ADD REPLY
0
Entering edit mode
Run summary stats on your matrix, I think there are strange values in there
ADD REPLY

Login before adding your answer.

Traffic: 2519 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6