Biostar Beta. Not for public use.
Adding read groups to large BAM files
Entering edit mode
20 months ago
Fungsten • 0

How do you deal with large BAM files that miss read group or some parameter in the read group lines?

There has to be a better way to deal with this situations than using picard tools. In the case of 100-200G BAM files it takes ages to add or to modify the read group.

Also, is there any specific reason why some public BAM files, like from GiaB do not have all the fields in the read groups? Isn't that bad practice given the time that takes later on to pre-process the files?

bam alignment • 331 views
Entering edit mode


you could try samtools addreplacerg instead. But I'm not sure whether this is faster. In the end it is neccessary to iterate over the whole file to add the read group to each read.

Isn't that bad practice

Sometimes I have the feeling that "bad practice" were follwed more often than "good practice" :(

fin swimmer

Entering edit mode
11 weeks ago
ATpoint 17k

I think since there is no formal definition of what exactly a read group is, it is not bad practice to not have it included in the BAM by default. Most tools do not even need it for downstream analysis. I personally use bamaddrg (I personally avoid Picard/Broad tools whenever possible because it is overly picky and sometimes difficult to understand the cryptic error messages). If your BAMs miss fields that a downstream tool needs, there is unfortunately no way around writing a new BAM file with the necessary fields.


Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1