Question

SRA: fastq-dump gives different number of sequences

0

Entering edit mode

5.5 years ago

jeetsahu ▴ 10

I have downloaded read sequences using fastq-dump with split file option and SRR id for paired sequences. But splitted files have different number of sequence reads. As per my understanding, since these are paired-end reads these should have equal number of sequences.

$fastq-dump -I --split-files SRR390728

$grep -c '>' SRR7716545_1.fastq

694067

$grep -c '>' SRR7716545_2.fastq

1026976

Please correct me if I am wrong.

sra sequence • 1.6k views

ADD COMMENT • link updated 5.1 years ago by Biostar 20 • written 5.5 years ago by jeetsahu ▴ 10

score 3 · Accepted Answer · 2018-11-02

3

Entering edit mode

5.5 years ago

ATpoint 82k

Both files have the same number of reads. You have to grep for '^@', because @ is the fastq header prefix. > is fasta.

ls *.fastq | parallel "echo {} && grep -c '^@' {}"
SRR7716545_1.fastq
5644111
SRR7716545_2.fastq
5644111

ADD COMMENT • link 5.5 years ago by ATpoint 82k

0

Entering edit mode

Thanks, I grepped different symbol. One quick question - Does fastq-dump gives latest dataset used for assembly? if yes how can I get old datasets?

ADD REPLY • link 5.5 years ago by jeetsahu ▴ 10

0

Entering edit mode

fastq-dump gives the fastq based on the input SRR you give it. I have no detail knowledge about your SRR.

ADD REPLY • link 5.5 years ago by ATpoint 82k

0

Entering edit mode

Hello jeetsahu ,

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.

Upvote|Bookmark|Accept

ADD REPLY • link 5.5 years ago by finswimmer 16k