Biostar Test Site

This is site is used for testing only. Visit: to ask a question.

vg index memory allocation
Entering edit mode
7 months ago
kingcohn ▴ 20

I'm attempting to call variants using a reference graph I've generated first with minigraph ./minigraph -xggs -t16 f1.fa f2.fa f3.fa f4.fa f5.fa > PG_1.gfa and then vg./vg convert -g -a PG_1.gfa >

When I try to index, I'm prompted to modify the length of the nodes to 256 ./vg mod -X 256 > but it stalls out when I use this to index ./vg index -x -p PG_3.xg -g PG_3.gcsa

exiting at

Building XG index Saving XG index to Ldec_vg.xg Generating kmer files... Building the GCSA2 index... InputGraph::InputGraph(): 2193921420 kmers in 1 file(s)


The input is 1.3G in size, how much memory and CPUs should I be using to generate an index and call variants using fastq reads approximately 1.5-2GB in size?

vg reference graph • 288 views
Entering edit mode
7 months ago
Jouni Sirén ▴ 130

Assuming that the graph is not too complex locally (in a 256 bp window), ~2 billion initial kmers in a single graph file should require 100-200 GB memory and 200-300 GB disk space in $TMPDIR.

GCSA construction uses a semi-external algorithm that works best when the graph is partitioned (e.g. by chromosome) into multiple .vg files. It can then reduce the memory usage significantly by loading kmers from one graph file at a time.


Login before adding your answer.

Traffic: 163 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6