I will describe my troubleshooting in a timeline
Background: scRNA-seq prepared by Chromium 10x (i think version 3.0), and sequenced by Illumina. Then alignement and assembly of libraries was done with standard Cellranger protocol in 2019.
I am trying to do the velocyto protocol, using the standard Run10x function on the /outs folder I get the errors
2020-11-02 14:43:11,342 - WARNING - Not found cell and umi barcode in entry 1090 of the bam file
2020-11-02 14:43:11,343 - WARNING - Not found cell and umi barcode in entry 1093 of the bam file
2020-11-02 14:43:11,343 - WARNING - Not found cell and umi barcode in entry 1097 of the bam file
2020-11-02 14:43:11,343 - WARNING - Not found cell and umi barcode in entry 1098 of the bam file
etc...
The .bam file in question is in the outs folder /outs/possorted_genome_bam.bam
Velocyto requires error corrected CB / UB barcodes in the tag section http://velocyto.org/velocyto.py/tutorial/cli.html#requirements-on-the-input-files
As seen in cellranger support page here: https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/output/bam The sam/bam files are supposed to contain error corrected Cellular (CB) and UMI (UB) barcodes.
However, when I looked into the samfile, using simplesam python library I get
>>> x.tags
{'NH': 4, 'UY': '##########', 'nM': 1, 'CY': '################', 'li': 0, 'RE': 'I', 'AS': 93, 'HI': 2, 'CR': 'NNNNNNNNNNNNNNNN', 'UR': 'NNNNNNNNNN', 'RG': 'AAcount:0:1:CE2WPANXX:3'}
>>> x=next(in_sam)
>>> x.tags
{'NH': 5, 'UY': '##########', 'nM': 0, 'CY': '################', 'li': 0, 'RE': 'I', 'AS': 96, 'HI': 2, 'CR': 'NNNNNNNNNNNNNNNN', 'UR': 'NNNNNNNNNN', 'RG': 'AAcount:0:1:CE2WPANXX:5'}
etc...
Therefore, there are no CB or UB tags, only empty CR tags, and RG tags, whatever that means (I am not a specialist in .sam file format and conventions)
I have been looking around, and someone suggested the Cell and UMI barcodes are in the QNAME (Read ID) string. And that they could be added into the .tag fields, see Add tags to BAM/SAM file and https://github.com/velocyto-team/velocyto.py/issues/107
I tried printing it:
>>> x.qname
'D00624:100:CE2WPANXX:3:2315:13760:4349'
>>> x.qname
'D00624:100:CE2WPANXX:5:2209:14115:11948'
Etc.. According to some of these posts, the CB or UB might be contained in
Supposedly the UMI tag?
>>> x.qname.split(":")[2]
'CE2WPANXX'
Supposedly the barcode tag?
>>> x.qname.split(":")[1]
'100'
But neither of these look like valid barcodes to me.
Therefore my question is how do I obtain the Error Corrected barcodes and add them to my .bam file where they are missing, do I need to re-run the cellranger alignment ? I am thinking this may be some sort of problem related to an older version of Cellranger.
You should check how the BAM was generated from Cellranger. Normally it will indeed contain these tags. An example for
cellranger v.3.1
.