Biostar Beta. Not for public use.
Problem with allele number in vcf
0
Entering edit mode
3.0 years ago
BAGeno • 130

Hi,

I have vcf of 1000 samples. But I am facing the problem that I have different allele number of every site in vcf. There are some samples which have dots instead of 0 and 1 in genotype column. Can any one please tell me how should in correct the problem of different allele number?

1
Entering edit mode

./. is where the caller could not confidently call a genotype. I don't think there is much you can do computationally to address that, unless you had stringent filters set in the current call.

0
Entering edit mode

Hello BAGeno,

you can do this with bcftools:

$bcftools +fixploidy input.vcf > fixed.vcf  There are more option available that might be useful. Have a look at: $ bcftools +fixploidy -h


fin swimmer

EDIT:

I moved my post to an comment. Because I first thought that you have . as genotype and wanted ./.. If so the above solution should work (and I can move my post back to an answer). If you already have ./. in your vcf, than see the comment by Ram what this means.

0
Entering edit mode

TIL! I did not know this. What does this do exactly?

0
Entering edit mode

I cannot tell much more than what the help file do:

$bcftools +fixploidy -h About: Fix ploidy Usage: bcftools +fixploidy [General Options] -- [Plugin Options] Options: run "bcftools plugin" for a list of common options Plugin options: -d, --default-ploidy <int> default ploidy for regions unlisted in -p [2] -f, --force-ploidy <int> ignore -p, set the same ploidy for all genotypes -p, --ploidy <file> space/tab-delimited list of CHROM,FROM,TO,SEX,PLOIDY -s, --sex <file> list of samples, "NAME SEX" -t, --tags <list> VCF tags to fix [GT] Example: # Default ploidy, if -p not given. Unlisted regions have ploidy 2 X 1 60000 M 1 X 2699521 154931043 M 1 Y 1 59373566 M 1 Y 1 59373566 F 0 MT 1 16569 M 1 MT 1 16569 F 1 # Example of -s file, sex of unlisted samples is "F" sampleName1 M bcftools +fixploidy in.vcf -- -s samples.txt  So one can use it for example that male sample have gentotypes on X chromosome like 0 and 1 but females 0/0, 0/1, 1/1. ADD REPLYlink 0 Entering edit mode I have ./. in my vcf. Should I remove these calls from my vcf. I have do different population analysis. I did not called variants so I cannot do anything on that step. ADD REPLYlink 0 Entering edit mode Should I remove these calls from my vcf. This mainly depends on what exactly is your goal and how many samples have no calls in regions where that variant was found. Without knowing this there is no general answer. fin swimmer ADD REPLYlink 0 Entering edit mode I want to do population analysis. whether a certain disease variants is present in the population or not. Also can you please tell me should you I check this? how many samples have no calls in regions where that variant was found ADD REPLYlink 0 Entering edit mode how many samples have no calls in regions where that variant was found One way is to use gatk VariantsToTable. Or with awk (inspired by Kevin) : awk -F"\t" 'BEGIN {print "CHR\tPOS\tID\tREF\tALT\tNoCall"} !/^#/ {print$1"\t"$2"\t"$3"\t"$4"\t"$5"\t" gsub(/\.\/\./,"")}' input.vcf


fin swimmer