Biostar Beta. Not for public use.
CNVkit - Proper Way to Generate Accurate Sex Chromosome Predictions
Entering edit mode
2.7 years ago
fongchunchan • 10

I was wondering what is the best protocol for getting accurate sex chromosomes copy number predictions?

Currently I have a pool for normals of mixed gender which I am passing into the reference function. I don't set the -y option to create a male reference. There appears to be no option to actually give the gender of the various input coverage files. Yet if you use the -y option, the manual says:

Create a male reference: shift female samples' chrX log-coverage by -1, so the reference chrX average is -1. Otherwise, shift male samples' chrX by +1, so the reference chrX average is 0.

I assume that it automatically detects the gender and accounts for their sex chromosomes? Or is there a way to pass in the exact gender of each input normal sample?

I then use this pooled reference to then call fix, and then when it comes to the call function there is the -g option to specify the gender of the input sample, unlike the reference function.

Is there any critical steps that I am missing to getting accurate sex chromosome copy number predictions?

cnvkit • 1.3k views
Entering edit mode
14 months ago
Eric T. ♦ 2.4k
San Francisco, CA

The reference command will detect the input samples' chromosomal genders and adjust automatically whether or not -y was given -- without -y they are all converted to XX. The detection is pretty reliable; the reference samples are supposed to be generally copy-number-neutral, so if a sample has Turner syndrome or large-scale CNVs on chromosome X, it probably shouldn't be in the reference pool. The reference command will print the gender detected for each sample when it runs, so I recommend just checking the log messages to ensure that all samples' genders were detected correctly, then proceeding with the rest of the pipeline.

(But if the gender calls are incorrect for multiple samples for no clear reason, please let me know.)

Entering edit mode

@Etal: there is indeed a problem with the automatic gender detection when using reference command.

My sample is from a male. Command I use is: reference \
   *coverage.cnn \
   --fasta ref.fasta \
   -y \
   -o normal_ref.cnn

For the targetcoverage.cnn, gender is wrong:

Relative log2 coverage of X=-1.32, Y=-13.4 (maleness=0.501 x 1.45 = 0.728) --> assuming female

For the antitargetcoverage.cnn, gender is correct:

Relative log2 coverage of X=-1, Y=-1.26 (maleness=0.632 x 2.85 = 1.8) --> assuming male

Since I have only 1 normal sample, I cannot discard it from the reference pool. Is there a way to edit the script to bypass automatic gender detection?

Entering edit mode

Thanks, I'll see about adding a --gender option to the reference command in the next release.

Workaround: It looks like the coverage of the Y-chromosome targets in your sample was poor. Look at the 'log2' column in normal_ref.cnn to identify the targets on Y that were poorly captured in your normal sample, then delete those targets from your target BED file or the source targetcoverage.cnn files (make sure they all match) and rebuild the reference. If only the well-captured targets on Y remain, gender detection should work better. (If no Y targets remain, the pipeline will still work.)

I changed the statistical test in the development version of CNVkit on GitHub, so if you're able to try that it might deliver a better result. But given that the majority of targets on Y had poor coverage, it might still be misled into thinking there is no Y chromosome in your sample.

To hard-code your sample's gender in the script, you can edit cnvlib/ line 99 or so, where it says:

is_sample_female = cnarr.guess_xx()

Replace the method call with False to treat the sample as male.

Entering edit mode

Ok. I try that. Thanks !


Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1