Error using Tabix
1
0
Entering edit mode
5.2 years ago
AS • 0

Hi, I am trying to use vcf files downloaded from public database (NCBI SRA Bioproject:PRJNA486011). I am using only R to do that because I don’t know any other language. I have put the gz and the gz.tbi in the same folder.

>vcf.fl <- list.files("path/to/vcfs", full.names=TRUE, pattern=".vcf.gz$")
> tabx <- TabixFile("chr4_eva.vcf.gz", index=paste("chr4_eva.vcf.gz", "tbi", sep="."),yieldSize=10000)
> tabx
class: TabixFile 
path: chr4_eva.vcf.gz
index: chr4_eva.vcf.gz.tbi
isOpen: FALSE 
yieldSize: 10000 

> names(tabx)

 [1] ".self"        ".->path"      ".extptr"      ".->yieldSize" ".refClassDef" ".->index"     ".->.extptr"   "index"       
 [9] "path"         "yieldSize"   

> vcf.rng <- readVcf(tabx)

Warning message:
In .Seqinfo.mergexy(x, y) :
  The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
vcf tabix R • 2.3k views
ADD COMMENT
1
Entering edit mode

Probably one file has notations ike 1,2,3...X,Y and your has chr1,chr2... Check that.

ADD REPLY
0
Entering edit mode

I succeed to check the data of the .vcf.gz but I haven't found a way to look inside the tbi. Could you please indicate me a way to do it ?

ADD REPLY
0
Entering edit mode

You have to compare your VCF against the VCF you downloaded, not the index. Please show a subset of your VCF and of the downloaded VCFs.

ADD REPLY
0
Entering edit mode

Why don't you index them quickly by yourself - this is very simple:

tabix -p vcf <file.name>
ADD REPLY
0
Entering edit mode

Thanks for the answers.

I have downloaded the vcf from : ftp://ftp.sra.ebi.ac.uk/vol1/ERZ696/ERZ696780/chr8_eva.vcf.gz. The size of the download file is ok compare to the size indicated on the website.

I have tried to look at the file and I found strange that the ID seem full of NA and no samples are detected.

VCFA <- read.vcfR("chr8_eva.vcf.gz")
VCFA
***** Object of Class vcfR *****
0 samples
1 CHROMs
1,294,347 variants
Object size: 294.3 Mb
NaN percent missing data
*****        *****         *****

head(VCFA)

[1] "***** Object of class 'vcfR' *****"
[1] "***** Meta section *****"
[1] "##fileformat=VCFv4.2"
[1] "##GATKCommandLine.SelectVariants=<ID=SelectVariants,CommandLineOptio [Truncated]"
[1] "##source=SelectVariants"
[1] "##FILTER=<ID=LowQual,Description=\"Low quality\">"
[1] "##FILTER=<ID=my_snp_filter,Description=\"QD < 2.0 || FS > 60.0 || MQ < 30.0\">"
[1] "##FORMAT=<ID=AD,Number=R,Type=Integer,Description=\"Allelic depths fo [Truncated]"
[1] "First 6 rows."
[1] 
[1] "***** Fixed section *****"
     CHROM POS  ID REF ALT QUAL    FILTER         
[1,] "8"   "6"  NA "A" "G" "46.53" NA             
[2,] "8"   "13" NA "A" "G" "92.17" NA             
[3,] "8"   "14" NA "A" "C" "29.71" "my_snp_filter"
[4,] "8"   "18" NA "T" "C" "71.66" NA             
[5,] "8"   "20" NA "A" "G" "55.56" NA             
[6,] "8"   "27" NA "A" "G" "55.15" NA             
[1] 
[1] "***** Genotype section *****"
<0 x 0 matrix>
[1] 
[1] "Unique GT formats:"
[1] "No gt slot present"
[1]
ADD REPLY
0
Entering edit mode
5.2 years ago

The vcf file you got from ftp://ftp.sra.ebi.ac.uk/vol1/ERZ696/ERZ696780/chr8_eva.vcf.gz seems defective since its header does not contain the contig "8". As a quick and dirty fix you can add the missing contig with something like:

gunzip -c chr8_eva.vcf.gz | grep '^##' > chr8_eva.fix.vcf
echo "##contig=<ID=8>" >> chr8_eva.fix.vcf 
gunzip -c chr8_eva.vcf.gz | grep -v '^##' >> chr8_eva.fix.vcf
bgzip chr8_eva.fix.vcf

Now it should work:

VCFA <- readVcf("~/Downloads/chr8_eva.fix.vcf.gz") ## Seems ok

# Original vcf throws warning:
VCFA <- readVcf("~/Downloads/chr8_eva.vcf.gz")
Warning message:
In .Seqinfo.mergexy(x, y) :
  The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
ADD COMMENT

Login before adding your answer.

Traffic: 1889 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6