I can not run bcftools stats (help)
1
0
Entering edit mode
5.1 years ago
zion22 ▴ 70

Hi I would like to make statistics from vcf.gz files using bcftools stats, but when I try to run the following script, it generates files without weight my script is this:

> bcftools stats -F "My_reference_genome.fasta" -s "My_vcf.gz_file.vcf.gz" > "/T1_.vcf.stats"

Immediately ran the script it get me the following on the command screen:

About:   Parses VCF or BCF and produces stats which can be plotted using plot-vcfstats.
     When two files are given, the program generates separate stats for intersection
     and the complements. By default only sites are compared, -s/-S must given to include
     also sample columns.
Usage:   bcftools stats [options] <A.vcf.gz> [<B.vcf.gz>]

Options:
        --af-bins <list>               allele frequency bins, a list (0.1,0.5,1) or a file (0.1\n0.5\n1)
        --af-tag <string>              allele frequency tag to use, by default estimated from AN,AC or GT
    -1, --1st-allele-only              include only 1st allele at multiallelic sites
    -c, --collapse <string>            treat as identical records with <snps|indels|both|all|some|none>, see man page for details [none]
    -d, --depth <int,int,int>          depth distribution: min,max,bin size [0,500,1]
    -e, --exclude <expr>               exclude sites for which the expression is true (see man page for details)
    -E, --exons <file.gz>              tab-delimited file with exons for indel frameshifts (chr,from,to; 1-based, inclusive, bgzip compressed)
    -f, --apply-filters <list>         require at least one of the listed FILTER strings (e.g. "PASS,.")
    -F, --fasta-ref <file>             faidx indexed reference sequence file to determine INDEL context
    -i, --include <expr>               select sites for which the expression is true (see man page for details)
    -I, --split-by-ID                  collect stats for sites with ID separately (known vs novel)
    -r, --regions <region>             restrict to comma-separated list of regions
    -R, --regions-file <file>          restrict to regions listed in a file
    -s, --samples <list>               list of samples for sample stats, "-" to include all samples
    -S, --samples-file <file>          file of samples to include
    -t, --targets <region>             similar to -r but streams rather than index-jumps
    -T, --targets-file <file>          similar to -R but streams rather than index-jumps
    -u, --user-tstv <TAG[:min:max:n]>  collect Ts/Tv stats for any tag using the given binning [0:1:100]
        --threads <int>                number of extra decompression threads [0]
    -v, --verbose                      produce verbose per-site and per-sample output

If anyone could help me, I'd be very grateful. thanks

genome • 2.4k views
ADD COMMENT
1
Entering edit mode

Any reason why you deleted your question, zion22? - I have undeleted it. prasundutta87 went to the trouble of providing an answer and you should respect that.

ADD REPLY
1
Entering edit mode
5.1 years ago
prasundutta87 ▴ 660

'-s' stands for list of samples for sample stats

The command expects sample names and not the VCF file as you have written.

The correct command should be

bcftools stats -F "My_reference_genome.fasta" -s - "My_vcf.gz_file.vcf.gz" > "/T1_.vcf.stats"

This would of course give you stats for all your samples

ADD COMMENT

Login before adding your answer.

Traffic: 1940 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6