Question

CNV calling using ExomeDepth

0

Entering edit mode

5.9 years ago

NB ▴ 960

Hello,

I'm working to establish a pipeline for germline CNV calling. Our target consists of 150 genes with approx 1500 exons. Out data is HiSeq data (96 samples per run)

To establish a good reference set, I'm following this paper to calculate the inter-sample variation in coverage, using the rpkmCV for surveyed exons across reference samples selected by ExomeDepth. I know the basic formula for this is SD/Mean; but just not sure how to implement this in my data.

Does anyone know how to do this ? Or any suggestions on QC for the same ?

Thank you

CNV ExomeDepth reference • 3.7k views

ADD COMMENT • link updated 5.9 years ago by andrew.j.skelton73 6.5k • written 5.9 years ago by NB ▴ 960

score 0 · Answer 1 · 2018-05-16

ExomeDepth works on coverage, and there's functionality within ExomeDepth to determine an optimal reference / set of references to use. See section 5 here in the vignette. ExomeDepth typically goes with highly correlated samples, hence it's essential that your samples are from the same batch of sequencing. One thing to consider is if you have trios or related individuals in a pedigree, or a mix of probands and healthy samples, then which samples should / should not be included in your reference for a given test.

If you want to use the inter/ intra sample variation, then you'd apply this to the imported normalised counts. You could do something like an rLog or VST transformation before hand, but this will take some experimentation as that paper is quite ambiguous in its methods.