About GATK data-preprocessing workflow
1
0
Entering edit mode
4.9 years ago
9521ljh ▴ 50

I have fastq files that i want to make BAM files.

In GATK workflow of pre-processing, uBAM(unmapped bam)file is necessary because it have metadata.

Thus, i did

Fastq -> BWA - mapped BAM

Fastq -> Picard - uBAM

uBAM + mapped BAM -> Picard - Merge

However, i really don't know why this process is needed. Because we can add metadata to BAM with Picard(Addorreplacereadgroups) instead of using uBAM

i already read this article: https://gatkforums.broadinstitute.org/gatk/discussion/11694/why-is-converting-from-fastq-to-ubam-nesessary-before-preprocessing#latest

assembly next-gen GATK Preprocessing uBAM • 1.7k views
ADD COMMENT
2
Entering edit mode
4.9 years ago

The metadata is not related to the read groups.

As the skywarrior person said in the post you linked:

BWA hardclips reads if there is a significant discordance between the best matching kmer and the read. These hardclips may end up costing you a particular structural variant or a true indel call. Merging unmapped bam and initial alignment restores the hardclips which I know of no solution for that in BWA parameters.

Thus you are not really losing metadata... you are potentially losing actual data from your original sequencing reads. This step may be unnecessary depending on the type of dataset you have (Exome vs. Whole genome) or furthermore maybe you don't care about certain structural variants and/or know that they aren't present in your dataset.

ADD COMMENT
0
Entering edit mode

Thank you for reply.

could you explain example of metadata??.. i just thought it was like platform(illumina), library, Sample_NAMe...

but all of these is included AddorReplacegroups options.

ADD REPLY
0
Entering edit mode

Yes those are examples of metadata... but the issue here is that you are excluding the core of your data (i.e. nucleotide sequence) because of an underlying aspect of the bwa software. This is completely independent of any meta-deta.

ADD REPLY

Login before adding your answer.

Traffic: 2530 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6