Question

FASTQ Generation Difference between BaseSpace and BCL2FASTQ-v2

0

Entering edit mode

6.2 years ago

huseyinkoseoglu13 • 0

Hi there.

I did two study. One of them started with fastq files which i downloaded from Illumina BaseSpace, another one is produced from BaseCalls with bcl2fastq2 program. But interestingly, there is some size difference between fastq files which their origin is different.

Is there any reason for that? Currently I'm working on a Diagnostic center so this is so important for me. Rush answers will be great.

illumina basespace bcl2fastq2 fastq • 3.4k views

ADD COMMENT • link updated 6.2 years ago by swbarnes2 14k • written 6.2 years ago by huseyinkoseoglu13 • 0

score 0 · Answer 1 · 2018-01-26

0

Entering edit mode

6.2 years ago

GenoMax 141k

File size is never a good indicator of similarity. Depending on storage architecture the same file may be of different sizes on different storage devices due to differences in sector sizes etc.

Have you looked at the counts of reads/total number of bases in the two datsets assuming they are otherwise identical? Keep in mind that BaseSpace may be trimming your data automatically where as standalone bcl2fastq can be setup not to do that by default.

ADD COMMENT • link 6.2 years ago by GenoMax 141k

0

Entering edit mode

Yes i've check number of lines and bases. Still there is a difference. For some samples, BaseSpace data is larger, for another, bcl2fastq-v2 generated data is larger. Not just size, number of reads and bases are different also

ADD REPLY • link 6.2 years ago by huseyinkoseoglu13 • 0

0

Entering edit mode

Are these identical samples being processed locally via bcl2fastq and also BaseSpace? Start looking at the scan/trim settings for both methods. While it should not make a difference in theory, are you using the latest bcl2fastq (v.2.20) locally?

ADD REPLY • link 6.2 years ago by GenoMax 141k

0

Entering edit mode

yes version is latest.

ADD REPLY • link 6.2 years ago by huseyinkoseoglu13 • 0

0

Entering edit mode

What about settings? Are you using "fastq only" for bcl2fastq in your samplesheets? Same setting for BaseSpace? What is the run configuration (cycles x cycles, index)?

If yes, then you are going to have to start digging into the files to see where the differences are.

ADD REPLY • link 6.2 years ago by GenoMax 141k

score 0 · Answer 2 · 2018-01-26

For starters, you need to provide the command line used on bcl2fastq, and see if you can find the settings used when BaseSpace made the fastqs. One obvious thing, while the default compression level in bcl2fastq is 4, it could be set to anything 1-9. This could make the files appear bigger, even if they contain the same amount of info. I understand that this does not explain the whole discrepancy in your case. You might also check to see if one or the other included reads that did not pass filters. bcl2fastq by default will not include these, but perhaps it was run to include these.