Question

Entrez-efetch error. What could be wrong with my efetch usage?

0

Entering edit mode

5.0 years ago

jaqx008 ▴ 110

Hey everyone, I am trying to download an sra file via the unix command line and below is my efetch command. Many did encountered this error and found solutions from the recommendations. But I havent. I am thinking something is wrong with my efetch usage.

efetch -db sra -format fastq -id SAMD00028077 > Blastula.fastq

I get the following error and all the info for that sample is output into the fastq

Error

400 Bad Request No do_post output returned from 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=sra&id=SAMD00028077&rettype=fastq&retmode=text&edirect_os=linux&edirect=11.2&tool=edirect&email=edthompson@magnolia01' Result of do_post http request is $VAR1 = bless( {
                 '_rc' => 400,
                 '_protocol' => 'HTTP/1.1',
                 '_content' => 'ID list is empty! In it there are neither IDs nor accessions. ',
                 '_msg' => 'Bad Request',
                 '_request' => bless( {
                                        '_headers' => bless( {
                                                               '::std_case' => {
                                                                                 'if-ssl-cert-subject' => 'If-SSL-Cert-Subject'
                                                                               },
                                                               'user-agent' => 'libwww-perl/6.36',
                                                               'content-type' => 'application/x-www-form-urlencoded'
                                                             }, 'HTTP::Headers' ),
                                        '_method' => 'POST',
                                        '_uri_canonical' => bless( do{\(my $o = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi')}, 'URI::https' ),
                                        '_content' => 'db=sra&id=SAMD00028077&rettype=fastq&retmode=text&edirect_os=linux&edirect=11.2&tool=edirect&email=edthompson@magnolia01',
                                        '_uri' => $VAR1->{'_request'}{'_uri_canonical'}
                                      }, 'HTTP::Request' ),
                 '_headers' => bless( {
                                        'x-xss-protection' => '1; mode=block',
                                        'client-ssl-cert-subject' => '/C=US/ST=Maryland/L=Bethesda/O=National Library of Medicine/OU=National Center for Biotechnology Information/CN=*.ncbi.nlm.nih.gov',
                                        'server' => 'Finatra',
                                        'strict-transport-security' => 'max-age=31536000; includeSubDomains; preload',
                                        'ncbi-phid' => '322CB55B42D4D2E500002805E9D8639D.1.1.m_2',
                                        'content-type' => 'text/plain; charset=UTF-8',
                                        'client-date' => 'Wed, 24 Apr 2019 15:57:41 GMT',
                                        '::std_case' => {
                                                          'client-response-num' => 'Client-Response-Num',
                                                          'x-ua-compatible' => 'X-UA-Compatible',
                                                          'access-control-allow-origin' => 'Access-Control-Allow-Origin',
                                                          'ncbi-sid' => 'NCBI-SID',
                                                          'client-ssl-cipher' => 'Client-SSL-Cipher',
                                                          'x-ratelimit-remaining' => 'X-RateLimit-Remaining',
                                                          'client-ssl-socket-class' => 'Client-SSL-Socket-Class',
                                                          'content-security-policy' => 'Content-Security-Policy',
                                                          'l5d-success-class' => 'L5d-Success-Class',
                                                          'x-ratelimit-limit' => 'X-RateLimit-Limit',
                                                          'client-peer' => 'Client-Peer',
                                                          'client-transfer-encoding' => 'Client-Transfer-Encoding',
                                                          'set-cookie' => 'Set-Cookie',
                                                          'client-ssl-cert-issuer' => 'Client-SSL-Cert-Issuer',
                                                          'ncbi-phid' => 'NCBI-PHID',
                                                          'strict-transport-security' => 'Strict-Transport-Security',
                                                          'client-date' => 'Client-Date',
                                                          'x-xss-protection' => 'X-XSS-Protection',
                                                          'client-ssl-cert-subject' => 'Client-SSL-Cert-Subject'
                                                        },
                                        'client-ssl-cert-issuer' => '/C=US/O=DigiCert Inc/OU=www.digicert.com/CN=DigiCert SHA2 High Assurance Server CA',
                                        'vary' => 'Accept-Encoding',
                                        'set-cookie' => 'ncbi_sid=0BF58CE0F7076DE0_99BASID; domain=.nih.gov; path=/; expires=Fri, 24 Apr 2020 15:57:40 GMT',
                                        'client-transfer-encoding' => [
                                                                        'chunked'
                                                                      ],
                                        'client-peer' => '130.14.29.110:443',
                                        'cache-control' => 'private',
                                        'date' => 'Wed, 24 Apr 2019 15:57:40 GMT',
                                        'x-ratelimit-limit' => '3',
                                        'l5d-success-class' => '1.0',
                                        'content-security-policy' => 'upgrade-insecure-requests',
                                        'client-ssl-socket-class' => 'IO::Socket::SSL',
                                        'x-ratelimit-remaining' => '2',
                                        'client-ssl-cipher' => 'ECDHE-RSA-AES256-GCM-SHA384',
                                        'connection' => 'close',
                                        'ncbi-sid' => '0BF58CE0F7076DE0_99BASID',
                                        'access-control-allow-origin' => '*',
                                        'client-response-num' => 1,
                                        'x-ua-compatible' => 'IE=Edge'
                                      }, 'HTTP::Headers' )
               }, 'HTTP::Response' );

efetch RNA-Seq Entrez-direct • 3.2k views

ADD COMMENT • link updated 13 months ago by Ram 43k • written 5.0 years ago by jaqx008 ▴ 110

score 1 · Answer 1 · 2019-04-24

I don't believe that you can use EntrezDirect to fetch sequences from SRA. If you look at help for efetch you will see that only options supported for SRA are following:

 sra
                 native             xml      EXPERIMENT_PACKAGE_SET XML
                 runinfo            xml      SraRunInfo XML

You can do stuff like this:

$ esearch -db sra -query SAMD00028077 | efetch -format runinfo -mode xml

<SraRunInfo>
<Row>
<Run>DRR032680</Run>
<ReleaseDate>2017-09-20 06:08:37</ReleaseDate>
<LoadDate>2017-09-20 07:05:56</LoadDate>
<spots>57063175</spots>
<bases>5763380675</bases>
<spots_with_mates>0</spots_with_mates>
<avgLength>101</avgLength>
<size_MB>3437</size_MB>
<download_path>https://sra-download.ncbi.nlm.nih.gov/traces/dra4/DRR/000031/DRR032680</download_path>
<Experiment>DRX029486</Experiment>
<LibraryName>Bf_blastula_2</LibraryName>
<LibraryStrategy>RNA-Seq</LibraryStrategy>
<LibrarySelection>other</LibrarySelection>
<LibrarySource>TRANSCRIPTOMIC</LibrarySource>
<LibraryLayout>SINGLE</LibraryLayout>
<InsertSize>0</InsertSize>
<InsertDev>0</InsertDev>
<Platform>ILLUMINA</Platform>
<Model>Illumina HiSeq 2000</Model>
<SRAStudy>DRP003810</SRAStudy>
<BioProject>PRJDB3785</BioProject>
<ProjectID>407062</ProjectID>
<Sample>DRS049885</Sample>
<BioSample>SAMD00028077</BioSample>
<SampleType>simple</SampleType>
<TaxID>7739</TaxID>
<ScientificName>Branchiostoma floridae</ScientificName>
<SampleName>SAMD00028077</SampleName>
<Sex>male, female, and mixed</Sex>
<Tumor>no</Tumor>
<CenterName>UT-BS</CenterName>
<Submission>DRA003460</Submission>
<Consent>public</Consent>
<RunHash>00028A5454C28D3B103C3772C0B485A7</RunHash>
<ReadHash>9A87D0D38C636BC9F5D2E527B7FCB677</ReadHash>
</Row>

</SraRunInfo>

I suggest you use Phil Ewel's SRA-Explorer and get the command lines you need to download data from ENA/SRA.

Edit: You could do the following:

$ esearch -db sra -query SAMD00028077 | efetch -format runinfo -mode xml | xtract -pattern SraRunInfo -element download_path
https://sra-download.ncbi.nlm.nih.gov/traces/dra4/DRR/000031/DRR032680

And then use a wget/curl call and save the .sra file. You can then fastq-dump the sequence data.