TCGA/Broad Institute CNV Files Segment Mean
3
2
Entering edit mode
9.6 years ago
dirigible2012 ▴ 320

Hello everybody,

I am trying to analyse CNV data from TCGA to get a measure of overall CNV per patient.

When I download the Level 3 files taken from the SNP6 array, there is a column in the file called Segment_Mean. (Example at bottom.)

What do the numbers in this column represent?

I think they might be log ratios, but the link below makes me wonder if they are direct estimates of copy number. (In which case it is puzzling that they aren't whole numbers.)

http://www.broadinstitute.org/cancer/software/genepattern/modules/snp6copynumberpipeline

Thanks for any help,

Stephanie

Sample    Chromosome    Start    End    Num_Probes    Segment_Mean
BONZE_p_TCGAb56_SNP_1N_GenomeWideSNP_6_E04_666936    1    151040529    153927851    1558    0.2031
BONZE_p_TCGAb56_SNP_1N_GenomeWideSNP_6_E04_666936    1    153928595    153929981    2    -2.0772
BONZE_p_TCGAb56_SNP_1N_GenomeWideSNP_6_E04_666936    1    153933585    164456865    7473    0.1883
tcga cnv • 15k views
ADD COMMENT
13
Entering edit mode
9.6 years ago

Those are the log2 ratio of the tumor intensity to the normal intensity. To convert to an absolute cn, use: (2^seg_mean)*2

ADD COMMENT
0
Entering edit mode

Thanks for the information.

May I ask that why should we also "*2 (multiply by 2)" in the "(2^seg_mean)*2", instead of just "2^seg_mean"? Dose this "2" represent the normal intensity?

ADD REPLY
0
Entering edit mode

Right - the assumption is that the normal genome is diploid.

ADD REPLY
0
Entering edit mode

If they are truly log2 ratios of the tumor CN to the normal CNs, how can it be that I see the following in the TCGA ACC cohort?

                         sample chromosome start      end num_probes segment_mean
1: TCGA-2H-A9GF-01A-11D-A37B-01          1 61735 15024591       7600       0.0713
2: TCGA-2H-A9GF-11A-11D-A37E-01          1 61735 17217907       8841       0.0124

The second line is supposed to represent the matched healthy normal (11A denotes healthy normal tissue) of the same donor as the first line. Per your definition, shouldn't this line indicate 0? Against what is this sample compared to compute the segment_mean compared here?

ADD REPLY
0
Entering edit mode

You'll have to consult the metadata or description to see exactly how the files you're consulting were generated. There are ways of doing CN calling against a reference pool of samples as well. There may also be other files in that dump of data that contain the matched T/N data.

ADD REPLY
0
Entering edit mode

I had the same question....

ADD REPLY
0
Entering edit mode

Could you please tell me how to transform such segment files into gene-level copy number variation files?

ADD REPLY
0
Entering edit mode

Chris Miller , I have segment mean from methylation array ( when performing copy number analysis) and want to use this with gistic tool which required "Seg.CN (log2() -1 of copy number)" column in the main input. Wondering if this column can be used directly since I will get the original values. I am not sure, copy number mentioned in the gistic documentation is absolute CN.

ADD REPLY
0
Entering edit mode

I am in a similar pinch through working with the conumee package.

But one should at least in theory be able to first convert the segment mean into absolute cn through the methodology described by Chris. And then simply follow the Gistic documentation (i.e. to convert the absolute cn through (log2(absCN) - 1)). As you have noticed, this is mathematically the same, and as such the column should be able to work as input in its "native" state.

ADD REPLY
0
Entering edit mode
7.0 years ago
Zayni1234 • 0

I have a question , if you please can reply :

to convert BUBBY_p_TCGA_b89_105_SNP_N_GenomeWideSNP_6_D10_777410 > TCGA-2H-A9GF-01A-11D-A37B-01

we have to do it manually before running GISTIC?

thanks

ADD COMMENT
0
Entering edit mode
6.9 years ago
kingsire • 0

I am also wondering how you converted sample ID such as FLOUT_p_TCGAb60_SNP_N_GenomeWideSNP_6_C05_681024 to TCGA barcode ID such as TCGA-2H-A9GF-01A-11D-A37B-01 which is essential for the next analysis. could you please tell your way to solve this? thanks a lot

ADD COMMENT

Login before adding your answer.

Traffic: 2330 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6