I've recently been troubleshooting an error in part of my variant calling pipeline, which has been traced back to me using bam files aligned to hg38 as input for an Agilent deduplication tool which has yet to migrate from hg19 to hg38. Currently my workaround is to align to hg19, deduplicate, then split the resultant sam back into fastqs and re-align to hg38, which seems convoluted.
Should I continue working with hg38 once I'm past this step, or should I stick with hg19 all the way? How do other people balance pipelines when some tools/datasets are in hg38 and others have yet to switch over from hg19? Any advice on this whole hg19 v. hg38 issue would be appreciated.
Edit: The tool is LocatIt, which is used for deduplication of reads by the molecular barcodes used in the HaloPlex HS Target Enrichment System. https://www.agilent.com/cs/library/software/Public/AGeNT%20ReadMe.pdf
And you have to use this Agilent deduplication tool? There are alternatives, unless it's something specific you need.
Second that. Look at clumpify: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates.
The tool is LocatIt, which is used for deduplication of reads by the molecular barcodes used in the HaloPlex HS Target Enrichment System. https://www.agilent.com/cs/library/software/Public/AGeNT%20ReadMe.pdf
What is the tool? Which kind of data? Is it the AgilentMBCDedup Tool, used to process the Molecular Barcode (MBC) of a HaloPlex runs?
The tool is LocatIt, which is used for deduplication of reads by the molecular barcodes used in the HaloPlex HS Target Enrichment System. https://www.agilent.com/cs/library/software/Public/AGeNT%20ReadMe.pdf