Biostar Beta. Not for public use.
Problem with bcftools roh
0
Entering edit mode
14 months ago
United States

Dear All,

I have a problem with bcftools roh and hope you can guide me to what may be causing this:

I run Example-1 of the Github: http://samtools.github.io/bcftools/howtos/roh-calling.html and see the output consists of a state (0:HW, 1:AZ) , and quality for each position that was in the original vcf file (test.vcf.gz).

I ham a bam file and like to get the exact same-format output for my list of positions. Here is how I generate the original .vcf (just in case that the way I am generating this is causing the issue that I explain later).

samtools mpileup -uf $FASTA ./myfile.bam -l ./dbSNP_chr9.bed | bcftools call -m -O z > roh.test.vcf.gz

This will provide a .vcf for 309 positions that I queried, and the header/rest of the vcf looks fine:

    ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
    ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
    ...
    ##INFO=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality">
    ##bcftools_callVersion=1.3.1+htslib-1.3.1
    ##bcftools_callCommand=call -m -O z
    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  mysample
    9       421981  .       G       .       0       .       DP=470;SGB=-0.379885;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=0;AN=2;DP4=208,196,0,1;MQ=60 GT      0/0
    9       422026  .       T       .       0       .       DP=405;MQSB=1;MQ0F=0;AN=2;DP4=148,216,0,0;MQ=60 GT      0/0
    9       3206113 .       A       .       0       .       DP=338;MQSB=1;MQ0F=0;AN=2;DP4=219,79,0,0;MQ=60  GT      0/0
...

But when I try the command in the example, I only receive two lines (positions) in the output file as also noted in the stdout!

bcftools roh --AF-dflt 0.4 ./roh.test.vcf.gz 
# This file was produced by: bcftools roh(1.3.1+htslib-1.3.1)
# The command line was: bcftools roh --AF-dflt 0.4 ./roh.test.vcf.gz
#
# [1]Chromosome [2]Position [3]State (0:HW, 1:AZ)   [4]Quality
9   5009541 1   3.0
9   5125250 1   3.0
Number of lines: total/processed: 309/2

My .vcf has PL but I also tried skipping the PL scores with -G30 (as is in the example), just in case it had something to do with this but the answer was exactly the same.

I greatly appreciate if you could point me to what may be causing this.

Thank You,

nfarnoud

ADD COMMENTlink
0
Entering edit mode
13 months ago
Republic of Ireland

Is there an error? The program appeared to run successfully.

ADD COMMENTlink
0
Entering edit mode

Yes, it's puzzling. No error. But as noted in the stdout (last line of the block above) it says 2 (out of 309) lines are processed with no further explanation:

Number of lines: total/processed: 309/2

ADD REPLYlink
0
Entering edit mode

The genotypes that you have posted are each 0/0, i.e., homozygous reference. How are the remaining genotypes in your input file? Why are there only 309 variants? Are they scattered genome-wide? Note that, for ROH to do its work, the variants have to be a certain distance apart. For example, one cannot detect ROH with accuracy by processing 3 variants that span a region of 1 megabase.

ADD REPLYlink
0
Entering edit mode

The distance between variants is something I did not take into consideration! The reason that there are 309 variants is that I wanted to query LOH only on a specific chromosome (chr 9) and also I am working with a targeted assay. So what I did was finding all dbSNP snps that fall within my assay's .bed file and queried them using the 1st command (samtools + bcftools), and then used the resultant .vcf file as the input to roh command. About genotypes: most of the SNPs look homozygous in chr9 if we look only at GT field, however the percentage contribution of REF/ALT alleles for homozygous-called SNPs is still quite different (DP4) and that is why I thought the roh command may model those per-allele-frequencies and generate a different call. I may have not used the right approach to answer my question.

ADD REPLYlink
1
Entering edit mode

Yes, basically, you require a high density of genotypes in order to accurately make a ROH call. I have used exome-seq data in the past for ROH analysis. You may have to play around with the parameters.

Another option is to use PLINK's ROH implementation.

ADD REPLYlink
1
Entering edit mode

@kevin Thanks a lot. I came across that in searches, will give it try. Thanks again for your help and input!

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1