Biostar Beta. Not for public use.
Number of passing clusters vs. number of read pairs vs. total number of reads
0
Entering edit mode
17 months ago
AP • 90

Hi all,

I apologize for a rather basic question but I am confused about the terminology.

What is really the difference between:

  • Number of passing filters
  • Number of clusters
  • Number of read pairs per lane
  • Total number of reads

For instance, Hiseq 4000 should produce about 300M reads per lane. What does that mean exactly? If I sequence at PE150, does that mean the total number of expected reads should 600M? This is quite important when budgeting a project. 75,000 fragments with a 20X coverage would require 1,500,000 reads and so 0.0025 lanes of Hiseq 4000 (1,500,000/600M)?

Any help clarifying this would be highly appreciated!

ADD COMMENTlink
1
Entering edit mode
3 months ago
genomax 68k
United States
  • Number of clusters (library fragments anchored to flowcell capable of producing sequence). This number is fixed for patterned flowcells but variable for other flowcells. Library quality dependent.
  • Number of clusters passing chastity filter (initial Illumina data processing filter norms e.g. pure sequence, certain quality)

Chastity is defined as the ratio of the brightest base intensity divided by the sum of the brightest and second brightest base intensities. Clusters “pass filter” if no more than 1 base call has a chastity value below 0.6 in the first 25 cycles.

  • Number of read pairs per lane = Number of clusters passing filter in that lane (x2, if counting actual reads)

Illumina double counts reads in general so number of reads usually means only 1/2 unique library fragments.

ADD COMMENTlink
0
Entering edit mode

OK thank you very much for the clarification! So, does that mean I should consider 300M reads when calculating the number of lanes required for e.g. a 20X coverage (like in the example above?); Or should I double the number of reads?

ADD REPLYlink
0
Entering edit mode
ADD REPLYlink
0
Entering edit mode

Thanks but I don't find it very helpful and clear. I like being able to calculate this by hand myself.

ADD REPLYlink
0
Entering edit mode

Using published specification for HiSeq 3000/4000 :

2,500,000,000 single-end reads per 8 lanes = 312,500,000 reads per lane OR
5,000,000,000 paired-end reads per 8 lanes = 625,000,000 reads per lane

625,000,000 x 150 = 9.375000e10 total bases per lane for paired-end reads.

What is the average length of your 75,000 fragments going to be? You would be sampling the sequence between the two ends.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1