Question: Single Cell Sequencing Quality
0
Entering edit mode

I have some single cell sequencing data and I did some quick statistics on them to give feedback about the sequencing kits used (RepliG and PicoPlex). For some samples, I'm getting some discrepancies in the numbers and I was wondering if that's expected or is it an issue.

Here's some data for one of the sample:

samtools flagstat results

66783127 + 0 in total (QC-passed reads + QC-failed reads) 
8141047 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
66760206 + 0 mapped (99.97% : N/A)
58642080 + 0 paired in sequencing
29321040 + 0 read1
29321040 + 0 read2
53618530 + 0 properly paired (91.43% : N/A)
58597314 + 0 with itself and mate mapped
21845 + 0 singletons (0.04% : N/A)
274662 + 0 with mate mapped to a different chr
190889 + 0 with mate mapped to a different chr (mapQ>=5)
  1. According to samtools stats read length is 151 for 58,642,080 reads. So total bases = 151 * 58642080 = 8,854,954,080

  2. Breadth and depth calculated based on samtools depth :

    breadth = 35.44%
    depth = 26.49
    
  3. Total bytes / sum of all depths as per samtools depth for all bases = 2,234,214,697

  4. Total bytes / sum of all depths as per bedtools genomecov for all bases = 7,909,149,477

  5. Breadth and depth as per bedtools coverage (depth is higher than before when calculated using samtools depth which is expected)

    breadth = 35.34%
    depth = 123.304
    

As per my understanding, samtools depth only counts the reads with primary alignment and ignore secondary/supplementary alignments whereas bedtools genomecov counts everything that's in the bam file.

Now, my question is this, since there's a huge difference between the numbers in points 3 & 4 and point 4 is closer to point 1, does this mean this sample only has about 1/4th reads (designated as primary alignments) which are useful? I'm still trying to figure out if we need to eliminate the secondary/supplementary alignments and just keep primary ones or use all of them.

Plotting % targeted bases v read depth shows only about 5% bases with read depths 10+. Do you have any suggestions how this data can be improved or what is causing such issues? I see this in few single cell samples and not all of them.

TIA

ADD COMMENTlink 16 months ago hsbinf • 20 • updated 16 months ago RamRS 21k

Login before adding your answer.

Powered by the version 1.8