Salvage barcode undetermined reads from Illumina HiSeq 2500 2 x 100 bp pair-ended runs after demultiplexing
2
0
Entering edit mode
8.2 years ago
Louis Kok ▴ 30

Hi All, I have too many undetermined reads generated from the HiSeq run which cannot be assigned to any sample due to barcode issue. Samples were multiplexed with dual barcodes (8bp indexes x 2 = 16bp indexes). I used bcl2fastq-1.8.4 script to demultiplex with max. one base mismatch allowed.

After demultiplexing, I found that there are too many undetermined reads. Further checking the reads' barcode, I found that they are having two or more base mismatches to the list of indexes which were used to multiplex. Has anyone tried to salvage the undetermined reads, perhaps by allowing more mismatches? If yes, how many mismatches should be allowed while the keeping the outcome accurate?

Kindly share with me your experience. Thanks a lot.

demultiplexing Illumina undetermined barcode • 6.9k views
ADD COMMENT
0
Entering edit mode

FYI, whenever we've had this happen the samples/run had other issues and it ended up not being worthwhile salvaging the data.

ADD REPLY
1
Entering edit mode
8.2 years ago

bcl2fastq has a --barcode-mismatches option which is "number of allowed mismatches per index", just re-run it with --barcode-mismatches 2 or 3. In my experience the default of 1 works well in most of cases. However, I would make sure that such error rate is not due to problems with the run or with the sample labelling.

ADD COMMENT
0
Entering edit mode

That is likely to not work, and here's why. (At least with my version of the pipeline, I'd be happy to hear this has been fixed)

When you set it to mismatch of 1, it takes each barcode, and generates all the possible off-by-one barcodes, so when it sees those, it knows what barcode it's really supposed to be. So for AAAAAAAA, it decides that AAAAAAAT is one of those one-offs. If you also have AAAAAATT in the same lane, one of its one-off barcodes is AAAAAAAT. Rather than smartly say "Well, if we see that exact barcode sequence, we'll just skip it, because we don't know what its supposed to be" the software will refuse to process the lane. So when using mismatch-1, your barcodes have to differ from each other by at least 3 letters. If you up the mismatch allowance, there are likely to be barcode clashes that you didn't worry about at mismatch 1.

ADD REPLY
0
Entering edit mode

Can you really blame the software when you don't follow Illumina's recommendations for what are compatible barcodes?

ADD REPLY
0
Entering edit mode
8.2 years ago
trausch ★ 1.9k

We usually use the minimum pairwise hamming distance between all barcodes as a guide to set the number of allowed mismatches, e.g. if the minimum hamming distance is 3 we allow at most 1 mismatch, if it is 5 we allow at most 2, and so on. In general:

#mismatches = floor( (min(hamming(i,j)) - 1) / 2) for all barcodes i and j (i != j)

But I agree in most cases 1 mismatch works fine.

ADD COMMENT

Login before adding your answer.

Traffic: 2199 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6