Question

download complete human genome in SRA

0

Entering edit mode

5.0 years ago

oliveirajontec • 0

I need to download the entire human genome in the SRA format, by toollkit, but when I do the search instead of returning an SSR code I get a DRR code, and when I put in the toollkit the following message:

"2019-04-18T13:42:17 prefetch.2.8.2: KClientHttpOpen - connected to www.ncbi.nlm.nih.gov (130.14.29.110) 
2019-04-18T13:42:18 prefetch.2.8.2: KClientHttpOpen - verifying CA cert 
2019-04-18T13:42:18 prefetch.2.8.2 err: libs/vfs/remote-services.c:1650:EVPathInit: name not found while resolving query within virtual file system module - failed to resolve accession 'DRR142777' - no data ( 404 )
2019-04-18T13:42:18 prefetch.2.8.2: KClientHttpOpen - connected to www.ncbi.nlm.nih.gov (130.14.29.110) 
2019-04-18T13:42:18 prefetch.2.8.2: KClientHttpOpen - verifying CA cert 
2019-04-18T13:42:18 prefetch.2.8.2: KClientHttpOpen - connected to www.ncbi.nlm.nih.gov (130.14.29.110) 
2019-04-18T13:42:19 prefetch.2.8.2: KClientHttpOpen - verifying CA cert 
2019-04-18T13:42:19 prefetch.2.8.2 err: libs/vfs/remote-services.c:1650:EVPathInit: name not found while resolving query within virtual file system module - failed to resolve accession 'DRR142777' - no data ( 404 )
2019-04-18T13:42:19 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'DRR142777' cannot be found."

How do I get the complete genome in SRA?

The SRA I'm trying to get is the "Illumina HiSeq 2500 paired end sequencing of SAMD00131364 Accession: DRX135491"

already grateful

genome sequence • 2.0k views

ADD COMMENT • link updated 5.0 years ago by Arup Ghosh 3.2k • written 5.0 years ago by oliveirajontec • 0

1

Entering edit mode

I need to download the entire human genome in the SRA format

That does not make sense.

It looks like this data seems to have been released this week. Wonder if that is leading to issues you are having.

https://www.ncbi.nlm.nih.gov/sra/DRX135491 record seems to be odd in general. It is not showing any real read data either.

ADD REPLY • link 5.0 years ago by GenoMax 141k

0

Entering edit mode

That! I really want the raw data but can not find it.

The command I used was: "prefetch -v DRR142777"

ADD REPLY • link 5.0 years ago by oliveirajontec • 0

0

Entering edit mode

This record was made public earlier this week. It either has not been fully populated with data or there may be something wrong with the record. If you don't want to wait then I suggest that you email SRA support to let them know about this dataset.

ADD REPLY • link 5.0 years ago by GenoMax 141k

0

Entering edit mode

How do I find support email?

ADD REPLY • link 5.0 years ago by oliveirajontec • 0

0

Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is for new answers to original question.

Email: sra at ncbi.nlm.nih.gov

ADD REPLY • link 5.0 years ago by GenoMax 141k

0

Entering edit mode

This is the link to the data I'm trying to download: Illumina HiSeq 2500 paired end sequencing of SAMD00131364

ADD REPLY • link 5.0 years ago by oliveirajontec • 0

0

Entering edit mode

None of those records appear to have data.

Edit: See @arup's answer. This data is available from DDBJ. You can download it from there.

ADD REPLY • link 5.0 years ago by GenoMax 141k

score 3 · Answer 1 · 2019-04-18

3

Entering edit mode

5.0 years ago

Arup Ghosh 3.2k

The accession you mentioned is a DDBJ submission and it seems the data has not been mirrored to NCBI yet. But you can access the raw data (fastq, sra) from the following link.

https://ddbj.nig.ac.jp/DRASearch/run?acc=DRR142777

DDBJ Search: https://ddbj.nig.ac.jp/DRASearch/

ADD COMMENT • link 5.0 years ago by Arup Ghosh 3.2k

score 0 · Answer 2 · 2019-04-18

0

Entering edit mode

5.0 years ago

h.mon 35k

You are confusing things a bit: the SRA doesn't store the complete human genome, it store experiments that sequenced (among other organisms) the human genome. This means the SRA stores raw sequencing data, not assembled genomes.

Sequence Read Archive (SRA) makes biological sequence data available to the research community to enhance reproducibility and allow for new discoveries by comparing data sets. The SRA stores raw sequencing data and alignment information from high-throughput sequencing platforms

If you are actually trying to download the sequencing data, please show the command you used.

ADD COMMENT • link 5.0 years ago by h.mon 35k

0

Entering edit mode

Que! Eu realmente quero os dados brutos, mas não consigo encontrá-lo.

O comando que usei foi: "prefetch -v DRR142777"

ADD REPLY • link 5.0 years ago by oliveirajontec • 0

0

Entering edit mode

If you want help from everyone here it would be best if you use a language which most of us understand. If you feel more comfortable in Portuguese that's okay, but maybe add an English translation as well.

ADD REPLY • link 5.0 years ago by WouterDeCoster 47k

0

Entering edit mode

Cara, o estudo foi publicado faz uma semana só: https://www.nature.com/articles/s41467-019-09575-2

Data availability

The whole-genome sequencing data generated in this study have been deposited under BioProject PRJDB7193. All other data are contained within the article and its supplementary information or upon reasonable request from the corresponding author.

É possível que os dados brutos ainda não sejam carregados ao site de SRA. Recomendo entrar em contacto co o autor: https://www.nature.com/articles/s41467-019-09575-2/email/correspondent/c1/new

[Dude, the study was published just 1 week ago. It's possible that the raw data are still not uploaded to SRA. I receommend getting in touch with the author.]

ADD REPLY • link 5.0 years ago by Kevin Blighe 87k

3

Entering edit mode

@arup above gives the link for this dataset from DDBJ (Japan's data bank) in his answer. Data appears to have been originally submitted there. It has not yet propagated to ENA/NCBI. OP can download it from DDBJ.

ADD REPLY • link 5.0 years ago by GenoMax 141k