Biostar Beta. Not for public use.
How to access specifically 30x NA12878 sequencing runs
0
Entering edit mode
20 months ago
Fungsten • 0

I see in many referenced papers mentioning WGS 30x from sample NA12878, like in the following supplementary material:

https://www.biorxiv.org/content/biorxiv/suppl/2018/01/09/092890.DC5/092890-1.pdf

What I cannot find are instructions on how to access or generate the same FASTQ files. These datasets seem to be quite essential for benchmarking purposes, but I am not sure what is the best way to gather them.

Many thanks

ADD COMMENTlink
0
Entering edit mode
10 months ago
Belgium, Brussels

You are welcome : http://www.internationalgenome.org/data-portal/sample/NA12878

ADD COMMENTlink
0
Entering edit mode

Already tried those. Try to get the high coverage and will see the downloaded file doesn't make any sense for high coverage...

ADD REPLYlink
0
Entering edit mode

Try to get the high coverage and will see the downloaded file doesn't make any sense for high coverage...

What does this mean?

ADD REPLYlink
0
Entering edit mode
17 months ago
husensofteng • 80
Sweden

SRA explorer returns several projects on SRA that have the WGS raw data for NA12878. You could type NA12878 in the search box and add the desired results to the collection, then you get direct links to the fastq files from the save datasets button at the top of the page.

However, to identify which one provides the 30x dataset you may have to check the number of reads in the last column or go to the project home page on NCBI (just click on the accession in the second column of the result page).

ADD COMMENTlink
0
Entering edit mode
13 months ago
benformatics • 870
ETH Zurich

The paper you shared directly links to this ftp: ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/NA12878/ for patient NA12878.

I'm not sure about the naming/ordering of the folders in that directory. All the folders I checked contained at least alignments (.bam files) - you could always convert those to FASTAs if you want to re-align or something. Some of the folders do contain actual fastq/fasta files like Garvan_NA12878_HG001_HiSeq_Exome. Also, the paper you linked specifically mentioned analyzing the BAM files (which makes sense) - not FASTQs.

ADD COMMENTlink
0
Entering edit mode
14 months ago
France/Nantes/Institut du Thorax - INSE…

https://www.ebi.ac.uk/ena/data/view/PRJEB3246

We are providing deep whole genome sequence data for the CEPH 1463 family in order to create a "platinum" standard comprehensive set of variant calls. These genomes include a trio (NA12877 NA12878 and NA12882) sequenced to greater than 200x depth of coverage, as well as a technical replicate (separate library and sequencing, but same DNA sample) of NA12882 also sequenced to greater than 200x. Additional information and analyses will be provided at www.platinumgenomes.org.

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1