Unable to parse vcf file to create alternate genome and transcriptome database (using g2gtools).
0
0
Entering edit mode
8.0 years ago
kirannbishwa01 ★ 1.6k

This question is related to g2gtools, link: https://github.com/churchill-lab/g2gtools I am not sure how much help I can receive, but would be glad to hear any feedbacks.

I am trying to create a chain file using a reference genome from my model organisms and population level indel.vcf file. NOTE: This tools is used to build alternate reference genome and transcriptome database.

I am using the script described in here: https://github.com/churchill-lab/g2gtools which matches the script for the example files in https://github.com/churchill-lab/sysgen2015/blob/master/markdown/RNASeq_pipeline.md And, I am able to successfully able to run the tools using the example data and got expected outputs.

I then tried it on my data using the reference genome (ordered and indexed), indels.vcf (ordered, indexed and appropriately formatted according to vcf specification). But, I am getting an error. I have tried to make sure the vcf is not corrupted and has all the requirements fullfilled. Infact this indel.vcf was created using the same refence genome used with the tool, so there shouldn't be any incompatibilities. Also, I compared my indel.vcf with the example.indel.vcf and they comply with the format. But still getting the error message in terminal. Some part of the error message is:

VCF FILE: /media/everestial007/Seagate-ExtHDD/DATA_analyses/ASE_analysis-using_g2gtools/passed_indelsMA622.sorted.vcf.gz

FASTA FILE: /media/everestial007/Seagate-ExtHDD/DATA_analyses/ASE_analysis-using_g2gtools/lyrata_sorted.fa

CHAIN FILE: /media/everestial007/Seagate-ExtHDD/DATA_analyses/ASE_analysis-using_g2gtools/sorted-ref-to-MA622.chain

STRAIN: MA622

PASS FILTER ON: False

QUALITY FILTER ON: False

DIPLOID: False

STRAIN SAMPLE INDEX: 0

Parsing VCF file...

Processing Chromosome 1...

Processing Chromosome scaffold_24...

Processing Chromosome scaffold_86...

Processing Chromosome scaffold_118...

Processing Chromosome scaffold_149...

Processing Chromosome scaffold_184...

Processing Chromosome scaffold_214...

Processing Chromosome scaffold_54...

Unable to parse record, improper VCF file?

Unable to parse record, improper VCF file?

Unable to parse record, improper VCF file?

Unable to parse record, improper VCF file?

Unable to parse record, improper VCF file?

Unable to parse record, improper VCF file?

Can someone suggest me what might be going wrong. I have taken every measures available to make sure the vcf file is good (and it was generated using the reference genome used in the g2gtools pipeline).

Thanks much in advance !

vcf parse g2gtools genome • 2.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 2575 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6