bcftools merge, error "Could not parse the region(s)"
1
0
Entering edit mode
6.5 years ago
agathejouet ▴ 10

Hi all,

I am trying to merge multiple vcf files using bcftools version 1.6. Unfortunately, I receive the following error for all of my "chromosomes":

 [E::_regions_init_string] Could not parse the region(s): chr1

ex:

[E::_regions_init_string] Could not parse the region(s): JA218_chr6:29940123..29944556_UTR-0_lenght=4434
[E::_regions_init_string] Could not parse the region(s): JA455_chr11:20047638..20052236_UTR-0_lenght=4599
[E::_regions_init_string] Could not parse the region(s): JA327_chr9:10804606..10807219_UTR-0_lenght=2614
[E::_regions_init_string] Could not parse the region(s): JA205_chr6:24945965..24949792_UTR-0_lenght=3828

I have using the following command:

bcftools merge file1.vcf.gz file2.vcf.gz file3.vcf.gz -o outfile -O v -0

My vcf files have been compressed with bgzip and indexed with tabix (also version 1.6) using:

bgzip file1.vcf; tabix -p vcf file1.vcf.gz

Not sure what is happening here. Any help would be appreciated and please, let me know if you need any additional piece of information.

Thanks very much,

Agathe

bcftools vcf merge tabix • 5.8k views
ADD COMMENT
0
Entering edit mode

what is the output of

tabix --list-chroms file1.vcf.gz | head -n 50

please

ADD REPLY
0
Entering edit mode
bgzip file1.vcf; tabix -p vcf file1.vcf.gz

you'd better always force bgzip/taxix and use a logical AND

bgzip -f file1.vcf &&  tabix -f -p vcf file1.vcf.gz
ADD REPLY
0
Entering edit mode

The output is:

LOC_Os01g16370_chr1:9292171..9298764_UTR-0
LOC_Os01g16400_chr1:9313135..9317036_UTR-0
LOC_Os01g21240_chr1:11854526..11857948_UTR-0
LOC_Os01g25630_chr1:14525576..14530578_UTR-0
LOC_Os01g25710_chr1:14570811..14580124_UTR-0
LOC_Os01g25810_chr1:14611521..14616096_UTR-0
LOC_Os01g33684_chr1:18530856..18539774_UTR-0
LOC_Os01g35254_chr1:19515711..19521445_UTR-0
LOC_Os01g36640_chr1:20326945..20333788_UTR-0
LOC_Os01g41890_chr1:23745796..23750669_UTR-0
LOC_Os01g42330_chr1:24021015..24026491_UTR-0
LOC_Os01g52270_chr1:30047246..30048941_UTR-0
LOC_Os01g52280_chr1:30054522..30055994_UTR-0
LOC_Os01g52304_chr1:30061671..30065301_UTR-0
LOC_Os01g52320_chr1:30067560..30069086_UTR-0
LOC_Os01g57280_chr1:33097028..33103506_UTR-0
LOC_Os01g57870_chr1:33459210..33462869_UTR-0
LOC_Os01g58520_chr1:33815574..33820089_UTR-0
LOC_Os01g59340_chr1:34294608..34306006_UTR-0
LOC_Os02g04530_chr2:2011707..2018122_UTR-0
LOC_Os02g06030_chr2:3005600..3007976_UTR-0
LOC_Os02g06180_chr2:3084913..3087072_UTR-0
LOC_Os02g10900_chr2:5785295..5788769_UTR-0
LOC_Os02g16060_chr2:9136746..9140131_UTR-0
LOC_Os02g16250_chr2:9238437..9239234_UTR-0
LOC_Os02g16270_chr2:9257996..9262612_UTR-0
LOC_Os02g16330_chr2:9284786..9290815_UTR-0
LOC_Os02g17304_chr2:9922463..9928347_UTR-0
LOC_Os02g18000_chr2:10444558..10450974_UTR-0
LOC_Os02g18070_chr2:10490444..10496411_UTR-0
LOC_Os02g18140_chr2:10535633..10539697_UTR-0
LOC_Os02g18510_chr2:10776282..10780756_UTR-0
LOC_Os02g19750_chr2:11550971..11554437_UTR-0
LOC_Os02g19890_chr2:11701933..11708391_UTR-0
LOC_Os02g20420_chr2:12039379..12052089_UTR-0
LOC_Os02g26500_chr2:15556364..15560053_UTR-0
LOC_Os02g27500_chr2:16264148..16266679_UTR-0
LOC_Os02g27540_chr2:16306055..16309173_UTR-0
LOC_Os02g27680_chr2:16398585..16399403_UTR-0
LOC_Os02g41760_chr2:25097465..25099747_UTR-0
LOC_Os03g26260_chr3:15014453..15021952_UTR-0
LOC_Os03g37720_chr3:20912120..20915920_UTR-0
LOC_Os03g38250_chr3:21232110..21236446_UTR-0
LOC_Os03g48370_chr3:27529787..27536134_UTR-0
LOC_Os03g63150_chr3:35686155..35691879_UTR-0
LOC_Os04g02030_chr4:638803..642201_UTR-0

I have 759 "chromosomes" like this, in a similar format. Will also keep in mind to force overwrite and use the && (which I actually do in my rake file).

ADD REPLY
0
Entering edit mode

but is there the only whole word 'chr1' as you said in your first warning message ?

ADD REPLY
1
Entering edit mode
6.5 years ago

from the VCF specification : https://samtools.github.io/hts-specs/VCFv4.3.pdf

CHROM - chromosome:(...) . The colon symbol (:) must be absent from all chromosome names to avoid parsing errors when dealing with breakends.

you vcf is not valid.

ADD COMMENT
0
Entering edit mode

Thanks very much for your help, will try to modify this!

Agathe

ADD REPLY
0
Entering edit mode

I would go for something like:

awk -F '\t' '/^#/ {print;next;} {OFS="\t";gsub(/[\:\.\-]/,"_",$1);print;}' input.vcf
ADD REPLY

Login before adding your answer.

Traffic: 2453 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6