Performing Pathway Analysis On Cnv Data
1
1
Entering edit mode
10.4 years ago
Robert Sicko ▴ 630

I have groups of samples with copy-number variation (CNV) calls made based on microarray data. I am trying to determine if specific pathways are enriched with CNV for particular phenotypes. I've looked at How To Test Whether Copy Number Aberrations Are Enriched In A Gene List and other posts that describe pathway analysis from expression data. I currently have my data formatted for importing into PathVisio (tab-delimited file with genes as rows, columns are log transformed fold-change for each gene in each sample. If a gene was not overlapped by a CNV in a subject I assumed normal expression).

I have a few normal controls run with each batch, and each batch is a different phenotype. I'm trying to figure out the best way to determine if a pathway is enriched; should I compare pathway-X in sample1 to pathway-Y in sample1, should I compare pathway-X in phenotype1(all samples for a particular pathway averaged? summed?) to pathway-X in phenotype2, or should I do similar to the link above and generate random groups of genes of the same size as pathway-X and compare pathway-X in sample1 to randomly-generated-group-of-genes in sample1?

Statistics is not one of my strengths so any input is greatly appreciated.

cnv copynumber enrichment • 5.0k views
ADD COMMENT
1
Entering edit mode
10.4 years ago
B. Arman Aksoy ★ 1.2k

If I understand your question correctly then I think the first thing you should do is to decide what a pathway alteration means -- and what you will do when two genes have conflicting events (a homozygous deletion on one and amplification on another). I am saying this because people have different ways of defining an alteration in pathway. I saw people do this for expression data and they simply define a "pathway activity score" by averaging over all gene expression values for each sample. You can go with a similar approach for CNV data, but you should be aware that this will not be the same as gene expression -- and hence will be really noisy. People also convert these data into a binary matrix and simply define thresholds to call CNA event as altered vs non-altered. And they then use frequency of altered samples for each of their sample groups.

I think you can instead try to do an unbiased hierarchical clustering on your gene-level data (where you remove the non-altered genes to reduce the visualization complexity) and see if the cluster tend to capture your phenotype categories. If you want to apply this on a pathway level, then you can also collapse your data to pathways (group genes into pathways) and do a clustering with these pathways. I would first have this exploratory investigation on the data and then decide how you will decide on the features (either genes/pathways) that explains each of your phenotypes.

ADD COMMENT
0
Entering edit mode

Thanks for your reply. I think you are right, I should probably convert my data to a binary matrix instead of trying to force it to be expression data. Do you use an R package for unbiased hierarchical clustering?

ADD REPLY
0
Entering edit mode

I prefer R and use either heatmap, aheatmap or heatmap.2 to plot the data and label the rows/columns accordingly. If you don't feel comfortable with R, you can try GENE-E, which is GUI-based and helps with these types of operations: http://www.broadinstitute.org/cancer/software/GENE-E/download.html

Hope it helps,

ADD REPLY

Login before adding your answer.

Traffic: 2164 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6