Biostar Beta. Not for public use.
Best practices for variant calling on multiple sequencing runs of the same sample
Entering edit mode
3.2 years ago

Given a single DNA sample extracted from swab and these two cases:

  1. Prepared a single library and sequenced it 3 times separately

  2. Prepared 3 different libraries from the same DNA sample and sequenced each library

Both cases result in 3 sets of fastqs. Case 1 representing technical replicates for sequencing. Case 2 representing technical replicates for library prep.

If the goal is perform variant calling. How should I treat the fastqs from these two cases? Should I merge fastqs and then map/variant call? Should I keep them separate and somehow merge the gvcfs/vcfs? Are there variant calling methods/software that can take advantage of batch information?

How important is sequencing/biological batch information in terms of variant calling?

Entering edit mode
12 months ago

Provided that library preparation is performed using the same kit and similar circumstances this should be sufficiently reproducible to not create any technical artefacts. Therefore it would be safe to merge the reads, either as fastq files or bam files after parallelized alignment.

For completeness sake, in case someone stumbles upon this post:

cat run1_R1.fastq.gz run2_R1.fastq.gz run3_R1.fastq.gz > merged_R1.fastq.gz

(and analogous for R2)

For merging bams:

samtools merge merged.bam run1.bam run2.bam run3.bam
Entering edit mode
16 months ago
swbarnes2 5.7k
United States

The Illumina protocol adds very little technical bias. Running the same same sample on three separate runs is not necessary, not for RNA, especially not for DNA.

For RNAseq, there might be batch differences between preps done on different days, but for DNA, this won't matter. If you prepped the three libraries side by side, they won't differ significantly. Neither your step one or step two is necessary. Different batches affect quantitative measurements, like in RNA seq, but shouldn't affect variant calling, unless your PCR duplication level is out of control, and you are trying to quantify allele frequencies.

You can concatenate (cat) the fastqs together prior to alignment (cat works on gzipped files fine), or samtools can merge the .bam files afterwards.


Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1