I wonder if there is an existing software studies duplicates from NGS bam file. I understand that Picard and samtools can both mark/remove duplicates, but what if I want to count the frequency of each duplicated fragment?
I could do some string comparison in Python and categorize the duplicates (or its aligned location) into dictionaries, but I am not sure this is optimal. Thank you in advance for reading.
Maybe dupRadar?
just use
samtools view -f 1024 -c in.bam "chr1:234-567"
?Sorry I guess I was not very clear in the post. I want to count the frequency of the duplicated reads, not regions - I do not know what regions to look at yet, unless I scan through the bam file first. Thank you so much for the reply.
so what about sorting the reads by names and counting the number of duplicate for the same name ? (I think this data is provided by picard)Thank you very much for the reply. I'm not sure what do you mean by "name" - I thought every read has a unique name at the very start (starting with @)?
ah yes, sorry, I was wrong.