Merging FASTQ files with Cellranger
1
0
Entering edit mode
2.9 years ago
PianoEntropy ▴ 70

I'm new to CellRanger and am doing genome alignments on a set of .fastq files which I did not generate myself. The files have are in a folder structure where there are 10 folders in total, each of the five samples L1-L5 (or SIGAA-SIGAE) have two distinct folders ending in _s1 and _s2. Now the difference between the _s1 and _s2 folders seem to be only in the numbering of the lanes, making me wonder whether they are from the same library after all. So in the _s1 folder I have the file SIGAA10_S1_L001_I1_001.fastq and in the _s2 folder the file SIGAA10_S1_L002_I1_001.fastq, and so on.

Hence my questions are:

  • Should one always run cellranger count on all fastq files from the same GEX well, or can this be corrected later using cellranger aggr? Now I have run cellranger count twice on the _s1 and _s2 files, but if I run cellranger aggr , will this automatically take into account reads that are mapped to the same barcodes and correct for this (assuming the same barcodes arise in the two datasets)?
  • Is there a simple way of checking whether the _s1 and _s2 folders indeed correspond to different lanes from the same library?
Cellranger Merging • 4.3k views
ADD COMMENT
0
Entering edit mode

1) If the samples are the same you can specify multiple lane numbers cellranger count using --lanes option. 2) Ask the person who has demultiplexed the data or prepared the libraries.

ADD REPLY
1
Entering edit mode

You don't have to do that. The default behavior of cellranger is to use all the lanes with the desired sample name.

ADD REPLY
0
Entering edit mode

The person who prepped the libraries might not have made the fastqs.

ADD REPLY
0
Entering edit mode

Hi, I suggested the 2nd point to make sure the person performed the experiment didn't have any specific reasons to get the lane splitted reads in two different folders.

ADD REPLY
2
Entering edit mode
2.9 years ago

Cellranger aggr is not intended to merge two technical replicates, and turn them into one. If it sees the same barcode in each sample, it will append a -1 or -2 to them, and keep them separate. You could start with cellranger aggr, and see if your two folders really do look like technical replicates.

So in the _s1 folder I have the file SIGAA10_S1_L001_I1_001.fastq and in the _s2 folder the file SIGAA10_S1_L002_I1_001.fastq

I've never head of using _s1 to indicate different lanes. In the fastq naming system, _S1 indicates sample 1. But those two examples are the same sample in different lanes. See the L002 and L001? Since those have a different file names, you can put them together in one folder, and cellranger will understand they belong to the same sample, different lanes.

ADD COMMENT
0
Entering edit mode

Thanks! Indeed, it seems likely that they were from the same sample. After I ran cellranger count on the files from both folders, I got roughly (sometimes exactly) the same number of cells as when I run it on a one of the two folders. I have no idea why the folders area structured this way though... I'm trying to reach the person who prepped the libraries.

ADD REPLY
0
Entering edit mode

As @swbarnes2 said, if the person artificially separated the sample files into lane specific folders then you can move them into one and re-run count. cellranger understands lane specific files and will deal with them.

ADD REPLY
0
Entering edit mode

Hi GenoMax I have downloaded the fastq files from Heart cell atlas. They describe that the data is generated by 10x v2 and 10x v3. I'm not sure how to differentiation between which fastq files are v2 and v3.
the downloaded files have R1 and R2 files, and some multiple lanes. Is it OK to merge them into single file name ? If so, what is the best way to merge these multiple files? There are multiple sub-folders with multiple files. Appreciate your response. Thank you

ADD REPLY

Login before adding your answer.

Traffic: 2826 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6