I seek advice as to the best strategy to salvage high quality reads from a 10X single cell RNA-, and ATAC-seq experiment that partially failed on the Hiseq-4000.
The issue is mainly with the scRNA-seq data.
On the flow cell we ran 5 lanes for RNA and 3 lanes for ATAC. The failure occurred as each modality requires a different run configuration due to indexing differences between them, scRNA-seq uses a single index, whereas scATAC-seq is dual index, and (we now know!) 10X do not recommended to mix single and dual indexed samples on the same flow cell.
Hindsight aside, we ran with dual index parameters, which resulted in the ATAC-seq data looking great, but for the RNA lanes, the quality scores for the reads on all of top half of the flow cell were abysmal. Why? Although 10X themselves have not been able to replicate this issue in house, this appears to be caused by a loss of focus on the upper surface of the flow cell after the i5 read. Others have mentioned the same issue elsewhere.
Our current strategy to salvage the high quality reads is to extract raw data from the sequencer from the good half of one of the RNA-seq lanes and create a new fastq file, to see if cell ranger like this, but I'm wondering - is the best strategy? - or is there a way to extract the reads with high quality scores from the fastq files that we have already generated?
If the answer is the latter, I'm unsure how to do this considering the forward and reverse reads for single cell data contain different information. This may be a trivial issue.
Any advice on this issue would be greatly appreciated.
---- Edit in response to ATpoint ----
As you can see the problem is with R2 reads, rather than R1.
Read 1:
@K00267:334:HFH3JBBXY:1:1101:1164:1156 1:N:0:AACCGGAA
NGAGAAGGTTACGATCACCTGGAAGGTC
+
#AAF-FAJJJJFFJJAJF7AAJ7F-<JJ
@K00267:334:HFH3JBBXY:1:1101:1225:1156 1:N:0:GGTTTACT
NGAGCAGGTTGCATCAAGCTGTCCGCCA
+
#AAA7FFJJJJJAJJFFJJJJJJJJJJF
@K00267:334:HFH3JBBXY:1:1101:1265:1156 1:N:0:TCGGCGTC
NGGATGTCAGCTACATATTGACCGTCTT
+
#AAAFJJJJJJJJJJJJJJJJJJJAFFF
@K00267:334:HFH3JBBXY:1:1101:1326:1156 1:N:0:AACCGAAA
NTCTCTAAGCATTTGCAAGCTGTAAGAC
+
#AAAFJFJJJJJJFFAJJFJJJJJJJJF
Read 2:
@K00267:334:HFH3JBBXY:1:1101:1164:1156 2:N:0:AACCGGAA
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
###########################################################################################
@K00267:334:HFH3JBBXY:1:1101:1225:1156 2:N:0:GGTTTACT
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
###########################################################################################
@K00267:334:HFH3JBBXY:1:1101:1265:1156 2:N:0:TCGGCGTC
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
###########################################################################################
@K00267:334:HFH3JBBXY:1:1101:1326:1156 2:N:0:AACCGAAA
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
###########################################################################################
You should contact Illumina rather than 10x, they may refund part of the cost. We routinely mix dual and single barcodes on the same flowcells (including with 10x samples) and don't have these sorts of issues.
We have contacted them, but unfortunately our service contract with the Hiseq 4000, has just ended (we are moving over to the Novaseq for future runs), so I'm not sure we'll get, or whether it's worth getting, anything back from them. We were running this experiment as a test on the samples to see if they are good enough for further experiments. It's interesting that you haven't seen this problem before, particularly as 10X themselves can't replicate it either. It may be a combination of factors that cause this (dodgy flow cell, dodgy reagents, single/dual indexing etc.). Just out of interest what run configuration do you run with if you are running dual and single index samples through the same flow cell? Have you ran 10X scRNA and scATAC on the flow cell before?
Ah, losing the service contract limits things a bit. I don't know the exact settings our sequencing core used. We've only recently had scATAC running, so I don't think it was mixed with scRNA-seq.
No probs. Many Thanks.