Biostar Beta. Not for public use.
Question: About GATK data-preprocessing workflow
Entering edit mode

I have fastq files that i want to make BAM files.

In GATK workflow of pre-processing, uBAM(unmapped bam)file is necessary because it have metadata.

Thus, i did

Fastq -> BWA - mapped BAM

Fastq -> Picard - uBAM

uBAM + mapped BAM -> Picard - Merge

However, i really don't know why this process is needed. Because we can add metadata to BAM with Picard(Addorreplacereadgroups) instead of using uBAM

i already read this article:

ADD COMMENTlink 9 months ago 9521ljh • 10 • updated 9 months ago benformatics • 870
Entering edit mode

The metadata is not related to the read groups.

As the skywarrior person said in the post you linked:

BWA hardclips reads if there is a significant discordance between the best matching kmer and the read. These hardclips may end up costing you a particular structural variant or a true indel call. Merging unmapped bam and initial alignment restores the hardclips which I know of no solution for that in BWA parameters.

Thus you are not really losing metadata... you are potentially losing actual data from your original sequencing reads. This step may be unnecessary depending on the type of dataset you have (Exome vs. Whole genome) or furthermore maybe you don't care about certain structural variants and/or know that they aren't present in your dataset.

ADD COMMENTlink 9 months ago benformatics • 870
Entering edit mode

Thank you for reply.

could you explain example of metadata??.. i just thought it was like platform(illumina), library, Sample_NAMe...

but all of these is included AddorReplacegroups options.

ADD REPLYlink 9 months ago
• 10
Entering edit mode

Yes those are examples of metadata... but the issue here is that you are excluding the core of your data (i.e. nucleotide sequence) because of an underlying aspect of the bwa software. This is completely independent of any meta-deta.

ADD REPLYlink 9 months ago
• 870

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0