Hi all,
I was recently given a data set (run through Illumina in 2014) to merge with a set run through Illumina this year. The set I just received was not demultiplexed and was dual barcoded. I know that Illumina's bcl2fastq can handle dual barcoded sets and demultiplex, but I don't have any of the BaseCall data it typically uses to demultiplex. I only have .fastq.gz files and a mapping file to work with. Does anyone know if I can still use the bcl2fastq without the BaseCall data, or if there is something that would work better?
tldr; I need to demultiplex dual barcoded .fastq.gz files
Is the barcode included in fastq headers?
Or are they provided as separated files?
The barcode is included in the fastq headers:
Unless you have two separate files that contain the index sequences you may be out of luck. Those files will likely have
I1/I2
in their names.See the WikiPedia fastq entry of where you should have seen the Illumina index sequence in the fastq header. Index sequence should have been present in this part of the header
1:N:0:
at the end.BTW: This is 16S data from a MiSeq run?
I have four files total for the same run: a read 1, read 2, read 3 and read 4, though I'm not sure that's what you mean.
Also yes, this was a 16S MiSeq.
So here is what you likely have.
File 2 = Index 1
andFile 3 = index 2
. Look in all files to make sure the reads match the expected length of the read/index sequences.File 1 = Read 1
andFile 4 = Read 2
.Use
extract_barcodes.py
script from Qiime package to process these files, if you intend to useQiime
for other analysis.If you just need the data demultiplexed then try FastqMultx.
Awesome! I was intending to use QIIME for the rest of my analysis, so then I can try the FastqMultx and then extract the barcodes with the QIIME python command?