scRNAseq: Split FASTQ into cell specific FASTQ Files
2
5
Entering edit mode
4.9 years ago
niklas.lang ▴ 50

Hi ! I did scRNAseq (10XChromium) of cancer cells and now I am looking for a simple way to split my FASTQ Files into celll specific FASTQ Files. Alternativley split my BAM File into cell specific BAM Files - because I need one FASTQ (BAM) File per cell in order to do perform variant calling on the single cell level. My goal is to obtain one single VCF file per cell.

Is there a build-in function in CellRanger for that? Are there any other tools to demultiplex my FASTQ/BAM Files using cell barcodes and split them into cell specific FASTQ/BAM Files?

Any help is highly appreciated! Thank you so much!

RNA-Seq Cell Ranger Split Demultiplex BAM • 4.7k views
ADD COMMENT
1
Entering edit mode

It is incredibly unlikely that this will prove useful. 10X reads will be enriched around the polyA site and be highly redundant.

ADD REPLY
0
Entering edit mode

My idea was to split the BAM File into several cell-specific BAM File by using the "CB" Tag. I don't know how this could possibly interfer the PolyA enrichment, so I would be glad to hear what your concerns are?

ADD REPLY
0
Entering edit mode

The question is whether single cell variant calling with this type of data is likely to be useful. The answer is "no", because you're mostly sequencing around the TES.

ADD REPLY
0
Entering edit mode

...meaning you're limited to detecting (exonic) variants close to the TES

ADD REPLY
4
Entering edit mode
4.8 years ago
igor 13k

If you want to call variants from 10x data, there is a specialized tool by 10x that is optimized for their data and uses the combined BAM file (no need to modify your BAM file): https://github.com/10XGenomics/vartrix

If you are still interested in splitting a BAM file by barcode, there are some approaches outlined in this blog post: https://divingintogeneticsandgenomics.rbind.io/post/split-a-10xscatac-bam-file-by-cluster/

ADD COMMENT
1
Entering edit mode
4.8 years ago

You could split on the CB tag, but coverage in single cell datasets is so poor, you aren't going to find high quality variants.

ADD COMMENT

Login before adding your answer.

Traffic: 1719 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6