Biostar Beta. Not for public use.
Can Illumina bcl2fastq use only one index for demultiplexing dual index sequencing data?
0
Entering edit mode
14 months ago
chen ♦ 1.9k
OpenGene

Hi,

For Illumina sequencing data with dual indexes (151 read1 + 8 index1 + 8 index2 + 151 read2), conventional demultiplexing method is to set both index1 and index2 for each sample.

However, for some data (i.e. UMI in index2), only index1 is fixed, and index2 is random. So there is no way to set both index1 and index2 in the sample sheet.

For such case, is it applicable to set only index1 to demultiplex data? Seems bcl2fastq doesn't support such settings. Does any have any experience?

ADD COMMENTlink
2
Entering edit mode
3 months ago
genomax 68k
United States

bcl2fastq handles UMI's that are part of Read 1/2. I am not sure how you are getting them in index 2.

A couple of possibilities come to mind.

  1. You could set a use-bases mask such as --use-bases-mask Y*,I8,n*,Y*. This would demux the data based on index 1 but still retain the sequence of index 2 in read headers. You can then parse the index sequences in the header and create a new SampleSheet.csv to re-demux original data or use something else to do a second round of demux with data from round 1.

  2. You could leave the data non-demultiplexed creating separate files for index reads. Then demux the data afterwards using reads 2 and 3.

Will random indexes be shared by more than one index 1's?

ADD COMMENTlink
1
Entering edit mode

Yes, different samples with different index 1 can have same random index 2.

Currently I demultiplex all data to Undetermined, and split the FASTQ file by its index 1. But it's time consuming.

I may try to alter bcl2fastq source code to support index 1 based demultiplexing for dual index data.

ADD REPLYlink
0
Entering edit mode

How many random indexes are expected in index 2 generally (tens, hundreads or more)? Doing #1 in my comment above may be faster, if the index 2 size is manageable.

ADD REPLYlink
0
Entering edit mode

thousands or even more

ADD REPLYlink
0
Entering edit mode

I think doing #1 is probably going to be the fastest option. One can easily collect index combinations from the resulting files from round 1 of demultiplexing. Since you work with NovaSeq the data files must be huge.

ADD REPLYlink
0
Entering edit mode

Biologically speaking, how are you even getting the UMI in index read 2?

ADD REPLYlink
0
Entering edit mode

Yes, with customized primers

ADD REPLYlink
0
Entering edit mode

Ah, that'll definitely break Illumina's software.

ADD REPLYlink
1
Entering edit mode
16 months ago
Gabriel R. ♦ 2.6k
Center for Geogenetik Københavns Univer…

You could simply use deML: https://grenaud.github.io/deML/

It is a maximum-likelihood demultiplexer algorithm that is designed to deal with incomplete or noisy data.

Hope this helps.

ADD COMMENTlink
0
Entering edit mode

This is not noisy data but an unusual modification where the UMI is in the second index read.

ADD REPLYlink
0
Entering edit mode

it was a general statement rather than a comment about the nature of OP's data :-) just do not demultiplex with the second index and simply use the first one. That will give you the demultiplexing using only the information provided by the first index.

ADD REPLYlink
0
Entering edit mode
14 months ago
h.mon 25k
Brazil

I am not sure this will work, but you can try bcl2fastq with the parameters --create-fastq-for-index-reads and --use-bases-mask Y151,I8,n8,Y151.

Worst case you will have to --create-fastq-for-index-reads and --use-bases-mask Y151,I8,I8,Y151, then join all reads from same index1 and use index2 as UMI.

ADD COMMENTlink
0
Entering edit mode
2.4 years ago
United States

You can specify which reads should be used for demultiplexing in RunInfo.xml, which may be more convenient than --use-bases-mask. I had a run with i7 (first index, Read#2) and i5 (second index, Read#3) but I only wanted to use i7 for demultiplexing.

  1. Make a backup copy of RunInfo.xml, which is in the run folder with the SampleSheet etc.

  2. Open RunInfo.xml and change the following:

<Read Number="3" NumCycles="8" IsIndexedRead="Y"/>

to

<Read Number="3" NumCycles="8" IsIndexedRead="N"/>

  1. Update SampleSheet.csv so it has only one barcode column

  2. Run bcl2fastq as you normally would

  3. The output was demultiplexed by i7 (first index, Read#2) and contained fastq files for three reads:

…_R1_…fastq.gz for Read#1

…_R2_…fastq.gz for i5 (second index, Read#3) that I didn't want to use for indexing

…_R3_…fastq.gz for Read#4 (it was a paired-end run)

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1