Question

Forum:Nextseq for analysts

8

Entering edit mode

9.8 years ago

Madelaine Gogol 5.3k

We got a Nextseq, and as the local person who has the duty of running our Hiseq/Miseq "primary analysis pipeline", I am trying to get everything set up with our current analysis pipeline and in-house lims.

I would like to collect any tips anyone has for transition to Nextseq somewhere. So far, I know that I need bcl2fastq version 2, which comes with a user guide. This appears to be a binary as opposed to the previous standalone bcl2fastq, which was a perl script that generated a shell script and the options are slightly different.

I have heard tell of picard breaking on nextseq data.

Is anyone using basespace? Illumina seems to be pushing it especially for the nextseq, but there are a few limitations that are holding us back (we use a lot of non-human-mouse-rat genomes, individual user accounts with no notion of "labs" or "groups" of any kind seem kinda weird).

basespace casava bcl2fastq nextseq • 13k views

ADD COMMENT • link updated 13 months ago by Ram 43k • written 9.8 years ago by Madelaine Gogol 5.3k

2

Entering edit mode

So if anyone is trying to do dual indexing on the nextseq, I have a tip for you. The second barcode needs to be reverse complemented during bcl2fastq! I just felt the need to put this information out there somewhere, since it's not really documented well in the bcl2fastq user guide.

ADD REPLY • link 9.0 years ago by Madelaine Gogol 5.3k

0

Entering edit mode

Thanks a lot! I was very useful for me!

ADD REPLY • link 8.8 years ago by Jorjial ▴ 300

0

Entering edit mode

Apparently nobody else cares. So since I posted this, I have also found that no, casava doesn't work, though you can run standalone eland if you want to (I don't).

ADD REPLY • link 9.8 years ago by Madelaine Gogol 5.3k

1

Entering edit mode

Perhaps it is that the platform is relatively new hence people don't have experience with it. We work with HiSeq here but reading this gets me worried, it seems like a step back.

ADD REPLY • link updated 2.5 years ago by Ram 43k • written 9.8 years ago by Istvan Albert 100k

0

Entering edit mode

Yes. I am very interested to hear experiences with NextSeq (and BaseSpace). But we are still using HiSeq2000, HiSeq2500 and MiSeq only.

ADD REPLY • link 9.7 years ago by Obi Griffith 20k

0

Entering edit mode

If eland is simply for QC, why can't the data just be sampled and run through any given aligner for an alignment rate?

ADD REPLY • link 9.8 years ago by seidel 11k

0

Entering edit mode

Yeah, sampling is an option. Right now we're just doing bowtie2 alignments. The alignment results could be actually useful for analysts now versus eland.

ADD REPLY • link 9.8 years ago by Madelaine Gogol 5.3k

Ram · Answer 1 · 2014-10-07

5

Entering edit mode

9.5 years ago

Joe Brown ▴ 70

We're actively using NextSeq and don't use BaseSpace. I wrote a little wrapper to handle the call to bcl2fastq which does some preprocessing of the SampleSheet.csv, calls bcl2fastq, then joins fastqs from the 4 lanes.

Now I maintain the converter here.

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by Joe Brown ▴ 70

0

Entering edit mode

Can anyone tell me how to use this script ?? what will be input and what will be output file??

ADD REPLY • link updated 2.5 years ago by Ram 43k • written 9.2 years ago by jeccy.J ▴ 60

0

Entering edit mode

Hi, Joe,

quick question: have you notice the fastq files generated from bcl2fastq 2.17 have differences from the one generated by the basespace? I found some sequences has NNNNNs instead of sequences for 0.07% of the reads.

BTW, the bcl2fastq2 has an option --no-lane-splitting to merge fastq files across lanes. Does that work differently from your script?

Thanks
Lijing

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.6 years ago by xunshengbu • 0

0

Entering edit mode

We've always had a low quantity of reads containing N which we deal with using Trimmomatic later downstream. I've never used basespace, so I don't have a comparison for you. I wonder if it's performing additional functionality.

Has it always had that option? If so, then I feel a little dumb for not seeing it. My script's use case is such that I wanted the lab personnel to be able to queue up a job before the sequencer finished, have the script do some preprocessing on the samplesheet, then cleanup the output files. It doesn't do anything extra fancy; just eases the process of conversion.

EDIT: looks like that option did exists as early as v.16. I missed it completely. Thanks for pointing that out.

ADD REPLY • link updated 2.5 years ago by Ram 43k • written 8.6 years ago by Joe Brown ▴ 70

0

Entering edit mode

We are also using NextSeq + BaseSpace onsite. Lately, we've reconfigured it to dump raw data to NFS. I'd like to process it in the cluster, but I don't see SampleSheet.csv in main run directory and anywhere else. Any idea where it is?

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 8.2 years ago by Leszek 4.2k

Ram · Answer 2 · 2014-07-01

1

Entering edit mode

9.8 years ago

ktk ▴ 10

We are having similar issues. If we use an Illumina enrichment we can have the NextSeq send the data to BaseSpace. BaseSpace will then demultiplex for us. This process takes awhile, but it is not hard to use at all. Once the machine has completed the run it sends the data to Basespace and a couple of hours later you have fastq files to download. But if you try to use a non preferred enrichment/kit you have to use their perl script to demultiplex. We are trying to find a better option to demultiplex.

ADD COMMENT • link updated 2.5 years ago by Ram 43k • written 9.8 years ago by ktk ▴ 10

0

Entering edit mode

We're not really using BaseSpace at all at this point.

bcl2fastq (version 2) works fine for me, though the sample sheet format is different from the hiseq sample sheet, more like a miseq sample sheet. We are settling on doing that and then running a bowtie2 alignment for QC purposes. So far.

ADD REPLY • link updated 2.5 years ago by Ram 43k • written 9.8 years ago by Madelaine Gogol 5.3k

0

Entering edit mode

Hopefully there will be options other than BaseSpace. Our experience with it is far from good one:

Some runs are not loading completely (mostly MiSeq 300+300 issue), e.g. 100% of R1.fastq and only 70% of R2.fastq reads (in this case re-upload helps)
I've spent lots of time and still have not figured out how to simply download file directly to headless UNIX server with e.g. wget, without doing some sophisticated cookie swapping

ADD REPLY • link updated 2.5 years ago by Ram 43k • written 9.7 years ago by mikhail.shugay 3.5k

Ram · Answer 3 · 2014-07-24

1

Entering edit mode

9.7 years ago

wen.luo ▴ 10

Our lab also introduced NextSeq recently. I was searching online for NextSeq data primary analysis information only this forum pops out...

We used to use CASAVA for our HiSeq and MiSeq data. But for NextSeq, CASAVA doesn't work as you said here. We don't have BaseSpace and have to turn to bcl2fastq2.0. It works pretty straightforward but unlike CASAVA, it doesn't provide any demultiplex log files or demultiplex stats. All I get from the program is a bunch of fastq files. I don't know whether it takes in the SampleSheet.csv or not, don't know if the index are trimmed or not. It's like a black box that generate the output without any processing information. Anyway it has done the job despite of lack of log information and it's the only free option we have for now.

Another thing I feel strange is the fastq files we get. Each samples are seperated into 4 lanes. In other work, for every sample we put in the SampleSheet.csv, we will get 8 fastq files out of bcl2fastq2.0:

SampleName_*_L001_R1.fastq.gz, SampleName_*_L001_R2.fastq.gz,
SampleName_*_L002_R1.fastq.gz, SampleName_*_L002_R2.fastq.gz,
SampleName_*_L003_R1.fastq.gz, SampleName_*_L003_R2.fastq.gz,
SampleName_*_L004_R1.fastq.gz, SampleName_*_L004_R2.fastq.gz

I have to merge the 4 fastq files for each sample and each pair end before I go on with next step.

I'm not sure if this is something only I came across...

ADD COMMENT • link updated 2.5 years ago by Ram 43k • written 9.7 years ago by wen.luo ▴ 10

2

Entering edit mode

It definitely does use the SampleSheet.csv, which has to be in a particular (miseq-style?) format to work correctly. There's a section about it in the bcl2fastq2 manual. The 4 lanes thing is because the machine has 4 lanes, but only one pool of samples can be loaded and goes on all 4 lanes. In our case, our sequencing facility wants us to keep the files all separate, because if something goes wrong with the fluidics on one lane, they need to know.

ADD REPLY • link 9.7 years ago by Madelaine Gogol 5.3k

0

Entering edit mode

Hi, I'd like to ask you about the '4 lane approach'. Does your pre-analysis quality assessment somehow check the samples across lanes to know whether all lanes are ok? If so how do you do this? Or do you get this information from the lab? I'm currently storing the files separately in backup but have them merged in working directories.

ADD REPLY • link updated 21 months ago by Ram 43k • written 9.0 years ago by Karol Pal jr. ▴ 20

0

Entering edit mode

The sequencing facility people look at the alignment percentages and the cluster counts and so forth after the alignment and (theoretically) could use that information to judge if something was wrong with one fluidics component. I don't know how likely that is, and I don't think it has happened yet, though.

ADD REPLY • link 9.0 years ago by Madelaine Gogol 5.3k

0

Entering edit mode

The lane information, as well as cluster tile and physical coordinates should be stored in FASTQ read header (http://en.wikipedia.org/wiki/FASTQ_format ), so one can later filter the merged file and run QC statistics by lane.

ADD REPLY • link 9.0 years ago by mikhail.shugay 3.5k

0

Entering edit mode

Reminds me of the ill fated LifeScope approach introduced by ABI Biosystems - they created an obscure data format so that data produced by the SOLiD instruments could only be demultiplexed if one also subscribed to their analysis service called LifeScope

ADD REPLY • link updated 2.4 years ago by Ram 43k • written 9.7 years ago by Istvan Albert 100k

0

Entering edit mode

By default, demultiplexing stats are located in:

<runfolder_dir>/Data/Intensities/BaseCalls/Stats/DemultiplexingStats.xml

You can't rely on -h for everything as a lot is laid out in the manual linked by Madelaine. Indexes are trimmed if they're present in SampleSheet.csv.

ADD REPLY • link updated 2.4 years ago by Ram 43k • written 9.5 years ago by Joe Brown ▴ 70