Biostar Beta. Not for public use.
Combining Data Of Multiple Vcfs Into One.
3
Entering edit mode
2.3 years ago
Sheila • 300
United States

I have a number of VCF files, where each VCF file possesses variant data for a single patient (this is the way Illumina provides their data). Is it possible to combine all of the data for the patients into one VCF file? If so, how? Can I use plink/seq to do this?!

Any suggestions and leads would be extremely helpful.

ADD COMMENTlink
5
Entering edit mode
14 months ago
William ♦ 4.4k
Europe

GATK CombineVariants, see:

From the above link usage examples:

Merge two separate callsets

java -jar GenomeAnalysisTK.jar \
   -T CombineVariants \
   -R reference.fasta \
   --variant input1.vcf \
   --variant input2.vcf \
   -o output.vcf \
   -genotypeMergeOptions UNIQUIFY

Get the union of calls made on the same samples

 java -jar GenomeAnalysisTK.jar \
   -T CombineVariants \
   -R reference.fasta \
   --variant:foo input1.vcf \
   --variant:bar input2.vcf \
   -o output.vcf \
   -genotypeMergeOptions PRIORITIZE \
   -priority foo,bar
ADD COMMENTlink
4
Entering edit mode
14 months ago
France/Nantes/Institut du Thorax - INSE…

Related duplicate post

Use vcf-merge

ADD COMMENTlink
0
Entering edit mode

Thanks. Is it possible to do this with plink/seq too?

ADD REPLYlink
2
Entering edit mode
16 months ago
Washington University School of Medicin…

Another option is joinx:

joinx vcf-merge [OPTIONS] file1.vcf file2.vcf [file3.vcf ...]
ADD COMMENTlink
0
Entering edit mode

Hi Malachi,

How will joinx behave when score annotation is absent for reference calls? I have a rather large bunch of VCF files with calls for all positions (gVCF?) but the annotation is different between positions with and without a call GT:DP versus GT:AD:DP:GQ:PL

I tried with the lastest version of bcftools and it seems to merge / report multiple lines randomly.

Will joinx use the snp DP as GT DP for ref calls?

thanks!

Jack

ADD REPLYlink
2
Entering edit mode
23 months ago
ewre • 220
United States

Since you are operating vcf files, vcftools would be a good choice, try

vcf-merge a.vcf.gz b.vcf.gz ... > combined.vcf.gz
ADD COMMENTlink
2
Entering edit mode
4 months ago
zx8754 7.5k
London

You can load multiple VCF to one plink/seq project, then output the project as one VCF.

pseq /path/to/project load-vcf

Given a project file has been created (/path/to/project) and contains 1 or more VCF files, this command loads these VCF files into the variant-database.

ADD COMMENTlink
0
Entering edit mode

Thanks! This is helpful! I'm having trouble loading the vcfs in to a project... these are my commands and output. Can you provide any help?

MY COMMANDS:

pseq testproject new-project --resources hg18
pseq /path/to/project/testproject load-vcf --vcf /path/to/TestVCFs/*.vcf

OUTPUT:

pseq error : database (/ifs/adni/pbhatt/ADNI/testproject_out/vardb) error (5) database is locked
plinkseq warning: database is locked (repeated 6 times)
plinkseq warning: preparing query database is locked
ADD REPLYlink
0
Entering edit mode

PLINK/SEQ documentation is not well maintained, it took me several hours of trial and errors to load the data. Try creating new project with resources and scratch folders defined, and ensure you have Read/Write access to those folders.

pseq proj1 new-project --resources /share/data/hg19 --scratch /tmp/myfolder.

Try loading 1 VCF file, if works then expand on your solution. There is GoogleGroups for pseq users.

ADD REPLYlink
0
Entering edit mode

Thanks! Yes I've tried posting in the GoogleGroups but have received more responses here. I agree about the PLINK/SEQ documentation - it's very difficult to understand when you're new to the software.

I loaded one vcf and it works fine - the problem is when i try to load more than one vcf together it seems...I will also try creating a new project with a scratch folder as well. Just so I know, what is the purpose of a scratch folder? - I couldn't find it on the Plink/Seq website.

ADD REPLYlink
0
Entering edit mode

I am guessing scratch folder is where temp files are created by PLINK/SEQ, before committing to database.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1