cnvkit example data
2
0
Entering edit mode
7.9 years ago
laxvid ▴ 10

Where can I get the bam files used in cnvkit examples cited in https://github.com/etal/cnvkit-examples

I did like to compare another cnv caller's performance against cnvkit -Lax

cnv bam cnvkit • 3.5k views
ADD COMMENT
3
Entering edit mode
7.9 years ago
Eric T. ★ 2.8k

The test samples are from Shain et al. 2015, Nature Genetics. Since these sequences are protected patient information the BAM files were submitted to dbGaP; there is a few months' delay before they appear online. However, those weren't ideal samples for testing copy number calling anyway -- desmoplastic melanoma genomes are dominated by somatic SNVs, not large-scale copy number alterations.

A better dataset for testing variant callers, both SNV and CNV, has become available recently: "An open access pilot freely sharing cancer genomic data from participants in Texas". I recommend running your benchmarks with these samples instead so that you can freely share your complete analysis. CNVkit has changed significantly since the version I benchmarked in the paper, so in any case you'll need to re-run the latest version of each caller (including CNVkit) to get representative results.

ADD COMMENT
0
Entering edit mode

Can CNVKit be used for copy number germline mutation detection?

ADD REPLY
0
Entering edit mode

Yes, but the resolution is fairly coarse, especially on target panels, so detection of smaller CNVs (e.g. below 1Mb) is be less accurate. This is less of a concern in cancer cases where somatic copy number alterations tend to affect entire genes or larger chromosomal regions.

For germline cases, if you only have targeted/exome sequencing data then CNVkit is worth running to get some copy number information, but a clinic with access to the original sample or extracted DNA should consider running another assay (e.g. SNP array, FISH, qPCR) in parallel if possible.

ADD REPLY
0
Entering edit mode
7.7 years ago
Akliao • 0

The texas site is perfect. However it is WES. Anyone know of a good illumina amplicon cnv dataset?

ADD COMMENT
0
Entering edit mode

It would be the best to ask this in a separate question.

ADD REPLY
0
Entering edit mode

I don't know of any that are publicly available, but try SRA or dbGaP. Targeted amplicon sequencing seems to be focused in smaller clinics where making the sequencing data widely available (e.g. IRB approval) is not the primary concern; bigger studies that are conducted with this intent are usually WES, WGS or at least hybrid capture with a broader panel. But if you find a good public TAS dataset, please let me know!

ADD REPLY

Login before adding your answer.

Traffic: 3741 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6