Hi,
Nowadays I am working with Whole exome sequencing data. After using Varscan2 and DNAcopy, I identified some samples as quite noisy:
My plan is to exclude these samples, but I want to have an statistical proof for doing it. Would it be possible to receive advice about this topic?
Thank you in advance
Axes/color description could help intepreting the figure.
You are absolutely right.
The green/black colors define different chromosomes. The red fragments are the segments obtained after applying the DNAcopy package to the output from Varscan2
The Y axis is the log ratio between normal and tumor.
The X axis is just an index putting all the segments in order. It is not the real length of each fragment in the chromosome, but more like the number of outputs obtained by the package
Thx for clarifying. I'm not very familiar with varscan and DNAcopy, but perhaps using longer segments could help stabilize the log ratio.
Thank you for your advice. Other patients get very clear segmentation, and I know that the data is indeed noisy... I am still wondering if forcing longer segmentation in noisy samples is worth it...
Thank you again
Just a suggestion : you could try to use CNV-seq that does more or less the same kind of analysis, but calculate automatically the 'optimal' segment size based on the read coverage. That make sense because higher read coverage gives more power to detect differences over smaller segments.