A Question About The Size Of Illumina Wgs Result
3
0
Entering edit mode
11.7 years ago
camelbbs ▴ 710

Recently I recevied the result of WGS from illumina, and I see each bam file in one sample has up to 100G!! I want to ask why the size is so big. With so big bam files, if i need to call mutations between samples, can I run samtools mpileup directly on that?

dna seq illumina • 4.2k views
ADD COMMENT
7
Entering edit mode
11.7 years ago
dfornika ★ 1.1k
  1. The size is big because modern sequencers are very good at producing tons and tons of sequence data!

  2. Yes, you can run samtools mpileup on it. As the samtools website recommends:

    samtools mpileup -uf ref.fa aln.bam | bcftools view -bvcg - > var.raw.bcf

    bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf

ADD COMMENT
3
Entering edit mode
11.7 years ago

As a further example of expected data produced and corresponding file sizes... In our group, three lanes of an Illumina HiSeq sequencer using v3 chemistry will produce enough data for 30-35X haploid coverage of a human genome (approximately 100-120 Gbp of sequence data). When this data is aligned with BWA, a BAM file of 85-95G will be typical with a BAM index file of 7-10M.

ADD COMMENT
1
Entering edit mode
11.7 years ago

Most probably, the each sample's material was divided and run on more than 3-4 lanes of a high throughout sequencer like Hiseq 2000, acting like technical replicates and then merged back later on. In case of ChIP-Seq if one singleplex's with low material, there are lot of duplicates observed.

ADD COMMENT

Login before adding your answer.

Traffic: 1957 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6