Question

Adding read group to bam files from multiplexed samples

0

Entering edit mode

5.7 years ago

serpalma.v ▴ 80

Hello

I have 60 samples (samp1...samp60), each one was barcoded and then pooled (10 samples/pool, 6 pools).

Each pool was sequenced in 9 lanes.

This leads to 1080 fastq files ( 60 samples * 9 lanes * 2 (PE) ) and 540 bam files.

I want to do variant calling with GATK.

I went through these two very informative posts:

https://gatkforums.broadinstitute.org/gatk/discussion/6472/read-groups

Read Group In Sam/Bam Files: What Do They Exactly Describe?

Accordingly, I am trying to define the read groups for each bam file, as follows.

ID: flowcell ID and lane ID (i.e. HNTW5BBXX_1)
SM: the name of the sample (i.e. samp31)
PL: ILLUMINA
LB: lib_samp31
PI: insert size (i.e. 200)
PU: flowcell ID and lane ID and sample ID (i.e. HNTW5BBXX_1_samp31)

I would like to clarify the following:

Did I get something wrong interpreting the fields?
Could I exclude PU?, as it is not required by GATK, according to the link above. Do you usually include it anyway?

Thanks in advance!

bam picard gatk • 2.7k views

ADD COMMENT • link updated 5.6 years ago by Biostar 20 • written 5.7 years ago by serpalma.v ▴ 80

0

Entering edit mode

Unless you have QC reasons to say that a lane did poorly, you should concatenate all 9 lanes together for each sample. Keeping them separate is doing you no favors. Merge the bams now before you do more.

ADD REPLY • link 5.7 years ago by swbarnes2 14k

0

Entering edit mode

I read here that keeping bams separated during pre-processing is reasonable. And also, the way I understood it, for each sample, every bam file corresponds to a different read group, as they are derived from reads produced by different lanes.

ADD REPLY • link 5.7 years ago by serpalma.v ▴ 80

1

Entering edit mode

5 year old recommendations are no longer relevant, just concatenate the lanes together.

ADD REPLY • link 5.7 years ago by Devon Ryan 104k

0

Entering edit mode

so then the read groups should be as follows:

ID: samp31
SM: samp31
PL: ILLUMINA
LB: samp31

Not sure about keepin PI and PU now...

Correct?

ADD REPLY • link 5.7 years ago by serpalma.v ▴ 80