Question

My paired end data became single end data after mapping

0

Entering edit mode

5 months ago

jude • 0

Dear community,

Something weird happened to me, my public dataset is obviously paired-end data (stated in 'metadata' part of ENA database, and there are two seperate fastq files (R1 & R2) and index file (I1) per sequencing run). After mapping them to reference genome by cellranger count, I performed typical scRNA-seq downstream analysis and applied stringtie and featureCounts to compare miRNA expression of each cell types. But the problem is, while trying to identify the strandedness of my data I ran infer_experiment.py which resulted in

infer_experiment.py -r hg38_GENCODE_V42_Basic.bed -i my_bam_file.bam

This is SingleEnd Data

Fraction of reads failed to determine: 0.0670

Fraction of reads explained by "++,--": 0.8406

Fraction of reads explained by "+-,-+": 0.0924

so I double-checked whether it's real by

samtools view -c -f 1 my_bam_file.bam

which yielded 0 while

samtools view -c -f 1 my_bam_file.bam

yielded 97581274, made me to think that aligned bam files (all of the generated bam files through downstream analysis) are actually single-end data. The problem might have arised from cellranger count, but there were no errors with mapping and no warnings at the summary.html output file (and also I made sure to include all the R1 R2 fastq files as an input). I totally can't understand why is this happening... any help will be appreciated.

Best,

cellranger stringtie • 574 views

ADD COMMENT • link updated 4 months ago by ATpoint 82k • written 5 months ago by jude • 0

score 4 · Accepted Answer · 2023-12-20

4

Entering edit mode

5 months ago

ATpoint 82k

Normal and expected. In 10x scRNA-seq R1 is cellular barcode and unique molecular identifiers, and R2 is gene expression, so technically (from a gene expression standpoint) it's indeed single-end.

ADD COMMENT • link 5 months ago by ATpoint 82k

0

Entering edit mode

So you mean data generated by 10x scRNA-seq is basically paired-end data but technically single-end data... Makes me confused but makes sense, thank you

ADD REPLY • link 5 months ago by jude • 0

1

Entering edit mode

Yes. Why is it confusing? If you look at 10x libraries (below) you see that the left-hand side of each fragments contains CB and UMI and right-hand side contains cDNA. ence, the R1 that "comes from the left" picks up CB/UMI and R2 "from the right" picks up cDNA. So technically it's paired-end because you use two reads on the same fragment, but there is only one read (R2) for the gene expression so it's single-end in that regard, and the aligner in the end only uses R2.

enter image description here

ADD REPLY • link 4 months ago by ATpoint 82k