Back in Cern era it is used to distribute large data into multiple chunks and process it in parallel on different nodes.
why cannot we do this on BAM ? plz correct me if i am wrong , someone told me that BAM is monolethic and cannot be split and distribute as such
plz advice
/Zee
thanks Pierre, what do you mean by " only use a genomic-section of the BAM" ?
again it depends of your analysis, but it could be something like:say you're counting the number of reads:
process 1 :
samtools view -c in.bam chr1 > chr1.count
process 2 :samtools view -c in.bam chr2 > chr2.count
process 3 :samtools view -c in.bam chr3 > chr3.count
process N:samtools view -c in.bam chrN > chrN.count
process N +1 :blablabala_for_sum chr1.count chr2.count (...) chrN.count > sum.count