Block break from gVCF files
2.1 years ago
win • 810

Hi all.

I am generating a gVCF file from Isaac Variant Caller and it outputs a gVCF file with all the non variant sites as blocks and I want to break the blocks into single variants line.

There is a way to do this using a BED file so the question is where can a BED file be found for the entire genome OR can this be accomplished without a BED file?

Any help will be highly appreciated.

gVCF • 1.5k views
Is this for the whole human genome? Breaking the a whole genome gVCF into a per-nucleotide VCF would result in a very large file.

16 months ago
rbagnall ♦ 1.4k

A bed file for the entire human genome (hg19) looks like this:

chr1 1 249250621
chr2 1 243199373
chr3 1 198022430
chr4 1 191154276
chr5 1 180915260
chr6 1 171115067
chr7 1 159138663
chr8 1 146364022
chr9 1 141213431
chr10 1 135534747
chr11 1 135006516
chr12 1 133851895
chr13 1 115169878
chr14 1 107349540
chr15 1 102531392
chr16 1 90354753
chr17 1 81195210
chr18 1 78077248
chr19 1 59128983
chr20 1 63025520
chr21 1 48129895
chr22 1 51304566
chrX 1 155270560
chrY 1 59373566
chrM 1 16571

You can break up the blocks of a gVCF file using the break_blocks utility of gvcftools:

Thank you. Could you please share how this was generated?

I use BWA program to align NGS data to the human genome. The genome has to be indexed by BWA before running and it produces a .fai file, which has the nucleotide length of each chromosome (second column of .fai file).

The nucleotide chromosome lengths are also found here:

Select the Assembly Statistics, and then Primary Assembly tabs. Assembled molecule for each chromosome gives the nucleotide length.


