I recently received a sequencing run from a sequencing center. This is an Illumina paired end Miseq run, Single-index. I usually get per-sample files post demultiplexing, however this time I got the whole lane, and with it just four raw files, listed below:
map_file.txt
Undetermined_S0_L001_I1_001.fastq
Undetermined_S0_L001_R1_001.fastq
Undetermined_S0_L001_R2_001.fastq
I would usually just send this through split_libraries_fastq.py
in qiime
, however I am not sure how to do this with the indexing. I would like to demultiplex these reads with minimal quality filtering. The ideal output would be two directories with forward and reverse reads for each sample as .fastq
files.
I have copied the output of head file.path
below for each file, if this is helpful. I realize the first two reads in the sequence files are probably junk.
head map_file.txt
#SampleID BarcodeSequence LinkerPrimerSequence sample_type Description geneticSampleID
OSBS.087.39.M.32.18.20140227 TCCCTTGTCTCC CGGCTGCGTTCTTCATCGATGC soil Plate 1A1 OSBS_087-M-32-18-20140227-gen
OSBS.048.41.M.37.33.20140227 ACGAGACTGATT CGGCTGCGTTCTTCATCGATGC soil Plate 1A2 OSBS_048-M-37-33-20140227-gen
OSBS.048.23.M.15.31.20140227 GCTGTACGGATT CGGCTGCGTTCTTCATCGATGC soil Plate 1A3 OSBS_048-M-15-31-20140227-gen
OSBS.047.21.M.20.3.20140227 ATCACCAGGTGT CGGCTGCGTTCTTCATCGATGC soil Plate 1A4 OSBS_047-M-20-3-20140227-gen
OSBS.119.23.M.18.38.20140227 TGGTCAACGATA CGGCTGCGTTCTTCATCGATGC soil Plate 1A5 OSBS_119-M-18-38-20140227-gen
OSBS.047.41.M.22.36.20140227 ATCGCACAGTAA CGGCTGCGTTCTTCATCGATGC soil Plate 1A6 OSBS_047-M-22-36-20140227-gen
OSBS.087.41.M.40.21.20140227 GTCGTGTAGCCT CGGCTGCGTTCTTCATCGATGC soil Plate 1A7 OSBS_087-M-40-21-20140227-gen
OSBS.048.21.M.5.11.20140227 AGCGGAGGTTAG CGGCTGCGTTCTTCATCGATGC soil Plate 1A8 OSBS_048-M-5-11-20140227-gen
OSBS.119.39.M.27.5.20140227 ATCCTTTGGTTC CGGCTGCGTTCTTCATCGATGC soil Plate 1A9 OSBS_119-M-27-5-20140227-gen
head Undetermined_S0_L001_I1_001.fastq
@M02149:120:000000000-ACC8C:1:1101:15995:1333 1:N:0:0
NTTTTCTTTTTT
+
#11111133111
@M02149:120:000000000-ACC8C:1:1101:14849:1368 1:N:0:0
NCTTTTTTTTTT
+
#11111111000
@M02149:120:000000000-ACC8C:1:1101:15950:1377 1:N:0:0
NCTTTTTTTTCT
@M02149:120:000000000-ACC8C:1:1101:15995:1333 1:N:0:0
head Undetermined_S0_L001_R1_001.fastq
TTTTTTTCCCCTTCTTCTTTCTCCTTCTTTTTTTCTTTCCCTCTCTTTTTTTTTTCCCTCCCTTCTTTTCTTTTTCCTTTTTTTCTTTTTCTTTCCTCCTCCTCTTCCCTCTCTTTCTTTTTTTTTCTCTCCTTTTTTCCCCTTTTCTCTTTTTTTTTTTTTCTTCTTTTTTCTTTTTTTTTTTTTTTTCTTTCTTCTTCTTTTCTCTTTCTTCTTTCTTCTTTTTTTTTCTCTTCCTTCTTTCCTCTCTT
+
>A1>ADD11BFBBFFGB3BFABA1BF1111110A0111211A00012111/////01110//01011112111110112211//01111101112111000000111101000121221111///-01111011111-/00//.0000000000--------/////////-//////--9--------//9//////////////////////////////////----/////////////////////
@M02149:120:000000000-ACC8C:1:1101:14849:1368 1:N:0:0
TTTTTTTCCCCTTCTTCTTTCTCCTTCTTTTTTTTTTTTTTTTCTCTCTTTTTTTTCTTTTCCTCCTTTTTTTCTTTTTTTTCTTTTTTTCTTCCCCTCCCTTTTTTCTCTTTTTTTTTTCTTTTTTTTTTCCTCTTCTTTTTTTTTCCTTTTTCTTTTTTTTTTCTTTCTTTTCTCTTTCTCTTTTTCTTTTTTTTTCTTTTTTTTTCTTTTCTTTTCTCTTCTTTTTTTTCTCTTCTTTCTTTTCTTCT
+
>A1>ADD11DFBFGGGG3GGFBD1BG1111110//////////0122112111///011112111001211//011111///011111--01111///.../0000-/0000000-----/00000-----////////////----/////////////-----/////////////////////////////----//////---9/////////////////////---///////////////////
@M02149:120:000000000-ACC8C:1:1101:15950:1377 1:N:0:0
TTTTTTTCCCCTTCTTCTTTCTCATTCTTTTTTTTTCTTCTCCCCCTCCCCTTTTTCCTTCCCTTTCTTTTCTTCTTCTTTTCCTCCCCTTCTCCCTCTCCCCCTCTTCCCTCTCCCTCCTCCTCCTCCCCTCTCTTTTTTTTCTTTCTTTTCTTCTTTTTTCTCTCTCTTTTTTCTTTTTTCTTTCTCCTTCTTTTTTCTTTTTTCTTTCTTCTTTTCTTTTCTCCTCCTTCTTTTTTTTCTTTTTCTTT
head Undetermined_S0_L001_R2_001.fastq
@M02149:120:000000000-ACC8C:1:1101:15995:1333 2:N:0:0
NTTTTTTTCCCTTTTTTTTTTTTTTTTTTTTTTCTCTCTTCTCTTTTTTTCTCTTTTTTTTTTTTCTTTTTCTTTCTCTTTCCTTCTTTTTCCTTTCCCCCTCCCTTCCCCCCCTTTCTCCTTTCTTTTTTTTTTTCTTTTTTTTTTTTTTTTTCCTTCTCTTCTTTTTCTCTTTTTTTCTCTTTCCTCTTCTTCTCTTTCTCTTTCTTTTTTTTTTCTTTTTTTTCCTCTTCTTTTTTTTTTCTTTTTTT
+
#1>>>>>>11B1BFEF000A/AA/////////>012211212122111//01222111////-<-000000/0000000000000000000/0000000/.-.../0////.---.000///////////------//////------------///////////////////////--/////////////////////////////////-----//////---////////////-----//////--
@M02149:120:000000000-ACC8C:1:1101:14849:1368 2:N:0:0
NTTTTTTTCCCTTTTTTTTTTTTTTTTTTTTTTTTCTCTTCTTTTTCTTTTTTCTTTTTTTTTTTTTTTTTTTTTTTTTTTCTCTTCTTTCTTCCCCCTTTTCCTTTCTCCTTCTCTCTCCCTTTCTTTTCTCTCTTTCTTTTTTTTCTTTTTTTTTTCCTTCTTTCTTTTTTTTTTTTTTTTCCCTCTCCTTTTTTTTCTTCTTTTTTTTTTTTTCTTTTTTTTTTTCTTCTTTTTTTCTTTTCTTTCTT
+
#1>>>>>>>B@DEFGF0A0A//A////>///////01222121111011111/011111///-------------------//////////////-----//////////////////////-/////////////////////---//////-----//////////////-----------////--///////---/////////--------//////------/////////--////////////
@M02149:120:000000000-ACC8C:1:1101:15950:1377 2:N:0:0
NTTTTTTTCCCTTTTTTTTTTTTTTTTTTTTTTTCTCTTTCCTTTCTTTTTTTTTTCTTTTTTTTTTTTTCTCCTTCCTTCCCCTCTTCCCTTCCCTCCCCCTCCTCCTCCCCTCCTCCCTCTCTTCCTTTTCTTTTTTCTCTCTTTTTTTTTTCTTCTTCCTTCTTTCCTTTTCCCTCCTCTCTTTTCCCTTCTTTTTCTTTTTTTCTTTTTTTTCTTTCTCTTTCTTTCCTTTCCTTTTTTTTTTCTTT
You should ask your sequencing provider to do the right thing and reprocess the data so you get it in the format you are used to. This is bad customer service to dump non-demultiplexed data in customers lap and expect them to demultiplex it themselves.
If you have no option but to do this yourself then look at deML to get this done. sabre is another option.
I'm with you on the "this is not cool" front. Unfortunately I have no recourse. If I ran the world raw data would be pased along as well as an extremely simple shell script or similar to demultiplex to per-sample files, and a readme with how to change the demultiplex settings. Thanks for the links!
For some odd reason, it's actually easier to demultiplex with QIIME than try to get QIIME to work with demultiplexed data. The QIIME users I know specifically ask for non-demultiplexed data. At least this was the case with QIIME 1.
I remember those days. However, in my experience this was before single or double index paired end sequencing.
I was using QIIME demultiplexing for paired-end single index, so 3 total reads like in the original post.
I would love to run this through qiime- can you copy/paste the qiime command you were using? split_libraries_fastq.py? I thought this required the reads to already be paired.
This was my old protocol for QIIME 1.8. Not sure if it is still valid, but maybe it will give you some ideas.
In my example,
R2
corresponds toI1
andR3
toR2
. That depends on how exactly you runbcl2fastq
.Workng with deML I ran the command:
Any pointers? Should I create a new question?
Create a new question.
I had to increase the limit by entering
ulimit -n 4096
. This was the max i could set on my cluster on the login node. I actually needed to go higher, and this was possible once I setup an interactive session or did this via a batch job.