Sequenced reads have high average Q30 but all the sequences are the same. How is this possible?
16 months ago
United States

My HiSeq run yielded an average quality score of over Q35, with >90% of the bases being >Q30. Yet, when I analyze these reads, I can read the sequence right off the "nucleotide contributions" plot (this plot shows relative abundance of nucleotides at each base position in the read). Even if it is a faulty library/library prep, this doesn't seem to add up. I would expect the quality to be much poorer due to low diversity. Has anyone ever seen this before?

What kind of an experiment is this? Are these amplicons (single or mixture)?

When a sample is known to have "low nucleotide diversity" a "spike-in" (e.g. phiX) is generally added, which still allows the run to proceed and the Q-scores will look fine.

This is not an amplicon experiment, it's transcriptome sequencing from human samples. We would not expect to get similar inserts. I think what happened is that the RNA extractions yielded very little RNA (not under my control) so we couldn't determine nucleotide diversity. It was also very difficult to make the library. I know for certain that phiX was not added.

17 months ago
United States

This sounds like adapter dimers (i.e., without inserts) and, yes, they can produce very high quality data. It's easy to check by comparing the per-cycle sequence peaks against your adapter sequence.

Another thing to add is that we did check those sequences against the adapters--they did not match (unless the wrong adapters were communicated to me, but I'm going to try trimming with other adapters right now). It's good to know, though, that identical reads can produce high quality adapters. thank you.

Use adapters.fa file included with BBMap suite (program bbduk.sh) to scan the data. It includes most common commercial adapters found in kits.

That said having an almost pure library of adapter dimers would be an interesting find. That would mean no library QC was done either by your sequence provider before sequencing.