Three fastq files in ENA for paired sequencing - PRJEB3381?
1
0
Entering edit mode
18 months ago
Dhana ▴ 110

Hi,

I am trying to do a benchmarking study using ENA datasets. I have downloaded the datasets from project with accession number PRJEB3381 belonging to CEPH pedigree 1463 using SRA toolkit (prefetch ERR194146 && fasterq-dump ERR194146). This is supposed to be paired end sequencing and I was expecting two files but there are three fastq files;

ERR194146_1.fastq head -n4 ERR194146_1.fastq

ERR194146_2.fastq head -n4 ERR194146_2.fastq

ERR194146.fastq head -n4 ERR194146.fastq

Someone had asked a similar question earlier but in my case NCBI website has not mentioned the file ERR194146.fastq as barcode (The library names are provided as - ERR194146_2, ERR194146, unspecified). In ENA, checking the sample accession (SAMEA1573614) it is defined as unspecified?

I checked the number of reads in ERR194146_1.fastq and ERR194146_2.fastq and they are the same so is it safe to ignore ERR194146.fastq and proceed with the other two?

Also, I checked the downloaded file via SRA vs direct wget download. The number of reads in both ERR194146_1.fastq and ERR194146_2.fastq is same but the number of reads in ERR194146.fastq is much less in SRA. What could be cause of it?

ENA WGS PRJEB3381 Illumina • 561 views
ADD COMMENT
2
Entering edit mode
18 months ago
GenoMax 142k

ERR194146_1.fastq and ERR194146_2.fastq is same but the number of reads in ERR194146.fastq is much less in SRA. What could be cause of it?

It is possible that the third file is for singleton reads left over after trimming the PE reads. You could ignore those.

ADD COMMENT

Login before adding your answer.

Traffic: 2547 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6