Tool:Human NGS Cancer Data for tool development, algorithm benchmarking, teaching, pipeline evaluation, etc.
0
18
Entering edit mode
8.7 years ago

We recently published a paper and made available a comprehensive human NGS cancer dataset for tool development, algorithm benchmarking, teaching, pipeline evaluation, etc.

This data is available for download directly from our FTP site.

Briefly, we sequenced a breast cancer cell line and matched normal lymphoblastoid cell line derived from the same individual. WGS, exome and RNA-seq data was produced for both of these samples. The data is all 2x100 bp Illumina reads from the HiSeq2000 platform.

A total of 10 lanes of HiSeq 2000 (v3 chemistry) sequence data consisting of ~1.8 billion 2x100bp reads were produced for HCC1395 and HCC1395/BL. Whole genome sequencing, exome sequencing and RNA-seq were performed as described previously. HCC1395 and HCC1395/BL were sequenced to average coverage levels of 56x (WGS)/155x (exome) and 31X (WGS)/124x (exome), respectively. RNA sequencing achieved 20x coverage of >50% of known junctions for 8,640 genes for HCC1395 and 9,437 genes for HCC1395/BL respectively. (source)

We provide this data in several versions. One is all of the data, but we also provide versions that have been downsampled to 1/100th, 1/1000th, and exome only.

A detailed description of all data files is provided here.

We describe a basic analysis of this data in the publication listed below. While this data represents only a single tumor/normal pair, we hope that this data will be useful to people who are: (a) developing alignment or variant calling algorithms/tools, (b) running educational workshops, and (c) benchmarking pipelines.

If you find this data useful, please cite:

PLoS Comput Biol. 2015 Jul 9;11(7) (full open access article).

WGS Benchmarking Cancer-Data • 3.5k views
ADD COMMENT
0
Entering edit mode

I noticed that you have showed some examples for the further integration. "for example, identify which variants at the DNA level are expressed at the RNA level and which events affect known cancer driver genes or druggable targets". Is there other methods or ideas to deeply integrate the WGS, exome and RNA-seq data? Thank you.

ADD REPLY

Login before adding your answer.

Traffic: 2711 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6