importing a GFA2 scaffold graph
1
0
Entering edit mode
6.8 years ago
egoltsman ▴ 10

Hi all, I'm exploring GATB with the idea of potentially replacing our custom assembly graph implementation with this wonderful library. As often the case, instead of creating an overlap graph from scratch (i.e. the reads), I'd like to jump in in the middle and import a set of 3rd-party unitigs and possibly paired-end mapping information to create a contig graph that would include both inter-node overlaps (as edges) and long-range scaffolding (as gaps). I could create a GFA2 file with all that info and convert to HDF5, but I wasn't sure from the documentation if that would be enough. It states that it must be "a '.h5' file is created using dbgh5 program provided with GATB-Core". I can certainly ensure that my contigs are unique in their kmer content, but what other restrictions does the Graph::load API have?
If anyone has tries something similar, any tips would be greatly appreciated!

GATB • 1.5k views
ADD COMMENT
0
Entering edit mode
6.5 years ago
Rayan Chikhi ★ 1.5k

Hi!

I hope the answer is still relevant now. Converting GFA2 to HDF5 doesn't make much sense in GATB: the info that we store inside the HDF5 actually consists of k-mers counts and a Bloom filter (and other stuff). So the graph stored in a .h5 is a regular de Bruijn graphn and cannot be of any other type.

In Minia there is actually early support for loading a GFA1 graph. It is designed for loading compacted de bruijn graph that were created with BCALM. The behavior for any other type of graph has _not_ been tested. But you're welcome to give it a shot, I assume that it will most likely require to modify GATB-Core code. If you're serious about following this road, please shoot me an email.

Actually, a recommended road that I could advise you to follow is to create "conservative" contigs using Minia (by tweaking the tip removal steps), or any other assembler, and load the resulting graph in e.g. Python, as it should most likely be much smaller (in terms of nodes) and manageable.

Rayan

ADD COMMENT

Login before adding your answer.

Traffic: 1953 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6