Biostar Beta. Not for public use.
Error using Tabix
0
Entering edit mode
16 months ago
AS • 0

Hi, I am trying to use vcf files downloaded from public database (NCBI SRA Bioproject:PRJNA486011). I am using only R to do that because I don’t know any other language. I have put the gz and the gz.tbi in the same folder.

>vcf.fl <- list.files("path/to/vcfs", full.names=TRUE, pattern=".vcf.gz$")
> tabx <- TabixFile("chr4_eva.vcf.gz", index=paste("chr4_eva.vcf.gz", "tbi", sep="."),yieldSize=10000)
> tabx
class: TabixFile 
path: chr4_eva.vcf.gz
index: chr4_eva.vcf.gz.tbi
isOpen: FALSE 
yieldSize: 10000 

> names(tabx)

 [1] ".self"        ".->path"      ".extptr"      ".->yieldSize" ".refClassDef" ".->index"     ".->.extptr"   "index"       
 [9] "path"         "yieldSize"   

> vcf.rng <- readVcf(tabx)

Warning message:
In .Seqinfo.mergexy(x, y) :
  The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
vcf tabix R • 319 views
ADD COMMENTlink
1
Entering edit mode

Probably one file has notations ike 1,2,3...X,Y and your has chr1,chr2... Check that.

ADD REPLYlink
0
Entering edit mode

I succeed to check the data of the .vcf.gz but I haven't found a way to look inside the tbi. Could you please indicate me a way to do it ?

ADD REPLYlink
0
Entering edit mode

You have to compare your VCF against the VCF you downloaded, not the index. Please show a subset of your VCF and of the downloaded VCFs.

ADD REPLYlink
0
Entering edit mode

Why don't you index them quickly by yourself - this is very simple:

tabix -p vcf <file.name>
ADD REPLYlink
0
Entering edit mode

Thanks for the answers.

I have downloaded the vcf from : ftp://ftp.sra.ebi.ac.uk/vol1/ERZ696/ERZ696780/chr8_eva.vcf.gz. The size of the download file is ok compare to the size indicated on the website.

I have tried to look at the file and I found strange that the ID seem full of NA and no samples are detected.

VCFA <- read.vcfR("chr8_eva.vcf.gz")
VCFA
***** Object of Class vcfR *****
0 samples
1 CHROMs
1,294,347 variants
Object size: 294.3 Mb
NaN percent missing data
*****        *****         *****

head(VCFA)

[1] "***** Object of class 'vcfR' *****"
[1] "***** Meta section *****"
[1] "##fileformat=VCFv4.2"
[1] "##GATKCommandLine.SelectVariants=<ID=SelectVariants,CommandLineOptio [Truncated]"
[1] "##source=SelectVariants"
[1] "##FILTER=<ID=LowQual,Description=\"Low quality\">"
[1] "##FILTER=<ID=my_snp_filter,Description=\"QD < 2.0 || FS > 60.0 || MQ < 30.0\">"
[1] "##FORMAT=<ID=AD,Number=R,Type=Integer,Description=\"Allelic depths fo [Truncated]"
[1] "First 6 rows."
[1] 
[1] "***** Fixed section *****"
     CHROM POS  ID REF ALT QUAL    FILTER         
[1,] "8"   "6"  NA "A" "G" "46.53" NA             
[2,] "8"   "13" NA "A" "G" "92.17" NA             
[3,] "8"   "14" NA "A" "C" "29.71" "my_snp_filter"
[4,] "8"   "18" NA "T" "C" "71.66" NA             
[5,] "8"   "20" NA "A" "G" "55.56" NA             
[6,] "8"   "27" NA "A" "G" "55.15" NA             
[1] 
[1] "***** Genotype section *****"
<0 x 0 matrix>
[1] 
[1] "Unique GT formats:"
[1] "No gt slot present"
[1]
ADD REPLYlink
0
Entering edit mode
13 months ago
WCIP | Glasgow | UK

The vcf file you got from ftp://ftp.sra.ebi.ac.uk/vol1/ERZ696/ERZ696780/chr8_eva.vcf.gz seems defective since its header does not contain the contig "8". As a quick and dirty fix you can add the missing contig with something like:

gunzip -c chr8_eva.vcf.gz | grep '^##' > chr8_eva.fix.vcf
echo "##contig=<ID=8>" >> chr8_eva.fix.vcf 
gunzip -c chr8_eva.vcf.gz | grep -v '^##' >> chr8_eva.fix.vcf
bgzip chr8_eva.fix.vcf

Now it should work:

VCFA <- readVcf("~/Downloads/chr8_eva.fix.vcf.gz") ## Seems ok

# Original vcf throws warning:
VCFA <- readVcf("~/Downloads/chr8_eva.vcf.gz")
Warning message:
In .Seqinfo.mergexy(x, y) :
  The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1