Question

Why CNV calling using VarScan need two steps of fragments merging?

0

Entering edit mode

6.4 years ago

CY ▴ 750

I have been using VarScan to call CNV for a while but have not got a chance to look into it carefully.

The workflow basically like this: 1) Run copynumber to compare read depth between normal and tumor and get small fragments based on ration of NT/NT 2) Run circular binary segmentation (CBS) and some how merge small fragments into larger fragments again based on ration of NT/NT 3) Run mergeSegements.pl to further merge fragments and this is the final result of your CNV

I seem to understand the purpose of 1). It generate some intervals and assign each with the mean depth. These intervals are somehow like individual data point for further analysis.

What I don't understand are 2) and 3). Why do we need 2 steps of merging? What is the difference of these 2 steps? Why can't we just merge once and achieve the purpose? Why can't we set up more appropriate criteria / perimeters at the very first step (step 1) and spare the merging step?

cnv VarScan copy number variant • 1.5k views

ADD COMMENT • link updated 6.4 years ago by arta ▴ 670 • written 6.4 years ago by CY ▴ 750

score 1 · Answer 1 · 2017-12-08

1

Entering edit mode

6.4 years ago

arta ▴ 670

Circular Binary segmentation (CBS) is an external tool which segments the fragments based on significant change-points by fitting a Gaussian distribution. It was written in R and Varscan uses the CBS as an intermediate tool, so they did not reimplement in C and Perl. The aim of step 3, mergeSegements.pl, is to find similar copy-number-variants and classify them into large-scale and focal.

Taken form paper:

Adjacent segments of similar copy number from the CBS algorithm were merged by an internally developed Perl script (MergeSegments), and classified by size. Events encompassing >25% of a chromosome arm were classified as large-scale; all others were considered focal events.

ADD COMMENT • link 6.4 years ago by arta ▴ 670

0

Entering edit mode

Thanks for explaining. But way can't we just use the result of first step? The first step already identified a number of break point.

ADD REPLY • link 6.4 years ago by CY ▴ 750

0

Entering edit mode

CDS does not classify the segments as amplification, deletion or neutral. By applying MergeSegments algorithm, these segments are classified as amplification (log ratio > 0.25), deletion (log ratio < -0.25) or neutral based (between -0.25 and 0.25) and merge the adjoints as same class. Moreover, amplifications and deletions are categorized as large-scale and focal. It is informative in terms of interpretation such as whole chromosome loss or chromosome arm lost or gain.

Hope it is clear now. :)

ADD REPLY • link 6.4 years ago by arta ▴ 670

0

Entering edit mode

Yes, it is really helpful. Thanks :)

ADD REPLY • link 6.4 years ago by CY ▴ 750