Check out the fuc
Python package I wrote:
For command line interface (CLI):
$ fuc vcf_merge 1.vcf 2.vcf 3.vcf > merged.vcf
For application programming interface (API):
Assume we have the following data:
>>> from fuc import pyvcf
>>> data1 = {
... 'CHROM': ['chr1', 'chr1'],
... 'POS': [100, 101],
... 'ID': ['.', '.'],
... 'REF': ['G', 'T'],
... 'ALT': ['A', 'C'],
... 'QUAL': ['.', '.'],
... 'FILTER': ['.', '.'],
... 'INFO': ['.', '.'],
... 'FORMAT': ['GT:DP', 'GT:DP'],
... 'Steven': ['0/0:32', '0/1:29'],
... 'Sara': ['0/1:24', '1/1:30'],
... }
>>> data2 = {
... 'CHROM': ['chr1', 'chr1', 'chr2'],
... 'POS': [100, 101, 200],
... 'ID': ['.', '.', '.'],
... 'REF': ['G', 'T', 'A'],
... 'ALT': ['A', 'C', 'T'],
... 'QUAL': ['.', '.', '.'],
... 'FILTER': ['.', '.', '.'],
... 'INFO': ['.', '.', '.'],
... 'FORMAT': ['GT:DP', 'GT:DP', 'GT:DP'],
... 'Dona': ['./.:.', '0/0:24', '0/0:26'],
... 'Michel': ['0/1:24', '0/1:31', '0/1:26'],
... }
>>> vf1 = pyvcf.VcfFrame.from_dict([], data1)
>>> vf2 = pyvcf.VcfFrame.from_dict([], data2)
>>> vf1.df
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Steven Sara
0 chr1 100 . G A . . . GT:DP 0/0:32 0/1:24
1 chr1 101 . T C . . . GT:DP 0/1:29 1/1:30
>>> vf2.df
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Dona Michel
0 chr1 100 . G A . . . GT:DP ./.:. 0/1:24
1 chr1 101 . T C . . . GT:DP 0/0:24 0/1:31
2 chr2 200 . A T . . . GT:DP 0/0:26 0/1:26
We can merge the two VcfFrames with how='inner'
(default):
>>> pyvcf.merge([vf1, vf2]).df
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Steven Sara Dona Michel
0 chr1 100 . G A . . . GT 0/0 0/1 ./. 0/1
1 chr1 101 . T C . . . GT 0/1 1/1 0/0 0/1
We can also merge with how='outer'
:
>>> pyvcf.merge([vf1, vf2], how='outer').df
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Steven Sara Dona Michel
0 chr1 100 . G A . . . GT 0/0 0/1 ./. 0/1
1 chr1 101 . T C . . . GT 0/1 1/1 0/0 0/1
2 chr2 200 . A T . . . GT ./. ./. 0/0 0/1
Since both VcfFrames have the DP subfield, we can use format='GT:DP'
:
>>> pyvcf.merge([vf1, vf2], how='outer', format='GT:DP').df
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Steven Sara Dona Michel
0 chr1 100 . G A . . . GT:DP 0/0:32 0/1:24 ./.:. 0/1:24
1 chr1 101 . T C . . . GT:DP 0/1:29 1/1:30 0/0:24 0/1:31
2 chr2 200 . A T . . . GT:DP ./.:. ./.:. 0/0:26 0/1:26
Thanks a lot