Biostar Beta. Not for public use.
Question: How to do unsupervised clustering using copy number variation data?
2
Entering edit mode

Hi, ALL, I want to do unsupervised clustering using segmented copy number variation data (like those derived from SNP array), and then visualize it. The results will look like the following figure (Figure 1A). Samples are clustered based on their CNV.

Clustering of copy number (Figure 1A)

I know how to draw a heatmap with clustering using data in matrix in R software. However, the data structure of the segmented copy number is quite different. I only know IGV tools can visualize this kind of data. But IGV doesn't provide options to do the clustering. Can anybody give me some instructions to do this? Any help will be greatly appreciated.

ADD COMMENTlink 3.7 years ago dr.chenway • 20 • updated 3.1 years ago manali.rupji • 0
Entering edit mode
0

Isn't that described in the method section of the paper (if you gave the link to the paper, we could read it) ? The key is to get a vector representation of the samples that captures the relevant information. From the figure, each sample appears to be represented by a vector in which each element corresponds to a section of chromosome and the values are copy gain/loss of each chromosomal section.

ADD REPLYlink 3.7 years ago
Jean-Karim Heriche
19k
Entering edit mode
0

Thanks for your answer. This is the original paper Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma. The authors did mention how they performed the analysis in the supplement data (page 11 of supplementary material). However, it was very simple and did not describe clearly how to do the clustering using copy number data. Thanks again.

ADD REPLYlink 3.7 years ago
dr.chenway
• 20
Entering edit mode
1

As I read it, they represented each tumor with a vector of regions identified by the GISTIC2.0 software as having copy number variations and each value in the vector is the log2 of the copy number of the corresponding region. Then they did clustering with:

d<-dist(data,method="euclidean")
tree<-hclust(d,method="ward.D2")
ADD REPLYlink 3.7 years ago
Jean-Karim Heriche
19k
Entering edit mode
0

Could you please elaborate a little on "The key is to get a vector representation of the samples" and "they represented each tumor with a vector"? Thanks.

ADD REPLYlink 2.7 years ago
apuhegde
• 20
Entering edit mode
0
  • Vector representation of the samples: each sample is represented by a series of numbers, each of which is considered to describe or capture some feature/property of the samples. This set of numbers is called a feature vector in machine learning and related fields. Note that for data mining purposes, all samples have to be described using the same set of features/properties.
  • They represented each tumor with a vector: In the case discussed here, each sample is represented by the number of copies it has of specific genomic regions.
ADD REPLYlink 2.7 years ago
Jean-Karim Heriche
19k
Entering edit mode
0

I wish to perform a clustering analysis on the long-insert whole genome sequencing assay CNV data based on the Multiple Myeloma database. As a part of their download, I have only the .seg file made available. I believe the GISTIC2.0 software requires a markers.file.

1) is GISTIC2.0 tool appropriate to use for whole genome sequencing assay CNV analysis? if not, what tools could I use? 2) How to account for the samples that do not have a copy gain, copy loss or is copy neutral?

ADD REPLYlink 3.1 years ago
manali.rupji
• 0
Entering edit mode
0

I wish to perform a clustering analysis on the long-insert whole genome sequencing assay CNV data based on the Multiple Myeloma database. As a part of their download, I have only the .seg file made available. I believe the GISTIC2.0 software requires a markers.file.

1) is GISTIC2.0 tool appropriate to use for whole genome sequencing assay CNV analysis? if not, what tools could I use? 2) How to account for the samples that do not have a copy gain, copy loss or is copy neutral?

ADD REPLYlink 3.1 years ago
manali.rupji
• 0
Entering edit mode
0

Please post this as a new question. Then come back and delete this post.

ADD REPLYlink 3.1 years ago
genomax
68k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0