Question

My run over-clustered - what does this mean for the data?

0

Entering edit mode

6.4 years ago

CC ▴ 50

Hi, I recently sequenced some libraries on a MiSeq, and I must have got the concentration wrong because it over-clustered. The cluster density was 1997k/mm2. The run still completed and I have data, but I was wondering what I can expect from this data/is it usable?

I had a massive amount of reads - 46.33 million with 35 million passing filter. I used a 600V3 kit and I thought the upper limit of this was 25 million reads so I'm a bit concerned.

Thank you for any advice.

miseq illumina RNA-Seq • 4.6k views

ADD COMMENT • link updated 6.4 years ago by GenoMax 141k • written 6.4 years ago by CC ▴ 50

0

Entering edit mode

Upper limit is 25M PE reads, while it seems you counted both R1 and R2, for which the upper limit would be 2x25M.

ADD REPLY • link 6.4 years ago by WouterDeCoster 47k

score 0 · Answer 1 · 2017-11-16

0

Entering edit mode

6.4 years ago

chen ★ 2.5k

The passing filter seems a bit lower than usual, but the data must be still usable.

Check the data quality with some QC/filtering tools, like FASTQC or fastp

ADD COMMENT • link 6.4 years ago by chen ★ 2.5k

score 0 · Answer 2 · 2017-11-16

0

Entering edit mode

6.4 years ago

colindaven 6.4k

Try aligning it and checking the % alignments and visualization. It's probably still pretty decent but I guess there could be a lot of duplicates.You can exclude these bioinformatically depending on the application.

ADD COMMENT • link 6.4 years ago by colindaven 6.4k

score 0 · Answer 3 · 2017-11-16

This run is clustered well beyond the normal spec for a MiSeq v.3 run. As others have said, if you are going to align to a good reference (and if this plain genomic DNA) then the data may still be usable. If this is for any kind of de novo work then you should re-run this with significantly reduced concentration.