Hello there,
We're working on ONT long reads data. To check base frequencies on specific positions, we planned to use the mpileup function from samtools. We've also computed depth using samtools depth:
samtools mpileup -d 200000 -Q 0 -f reference barcode.bam > barcode_mpileup.vcf
samtools depth -d 200000 barcode.bam > barcode.cov
However, we get coverage differencies for some positions using these two commands. And I dont realy get where these differences came from. See results: -samtools depth:
P3_GTE 1 9019
P3_GTE 2 11481
P3_GTE 3 13962
P3_GTE 4 14929
.......................
P3_GTE 20 45314
P3_GTE 21 61244
P3_GTE 22 96184
P3_GTE 23 115121
P3_GTE 24 126748
P3_GTE 25 131826
P3_GTE 26 134085
P3_GTE 27 137528
P3_GTE 28 134741
P3_GTE 29 138045
P3_GTE 30 127061
-samtools mpileup
P3_GTE 1 C 9019
P3_GTE 2 T 11481
P3_GTE 3 A 13962
P3_GTE 4 C 14929
......................
P3_GTE 20 G 45569
P3_GTE 21 C 61508
P3_GTE 22 T 97718
P3_GTE 23 G 116528
P3_GTE 24 T 127407
P3_GTE 25 G 132341
P3_GTE 26 C 135732
P3_GTE 27 C 138298
P3_GTE 28 T 139815
P3_GTE 29 T 141861
P3_GTE 30 G 143441
I've checked my bam using flagstat after consulting some github issues:
179524 + 0 in total (QC-passed reads + QC-failed reads)
3 + 0 secondary
1169 + 0 supplementary
0 + 0 duplicates
179524 + 0 mapped (100.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
Can we suppose these differences are caused by these "supplementary" and "secondary" reads? What are these reads exaclty ? It is possible to exclude them from bam?
Thanks,stay safe and have a nice week,
Hadrien
Check these
mpileup
arguments to only count reads having (or not having) the respective flags set. For flags, see https://broadinstitute.github.io/picard/explain-flags.html