Biostar Beta. Not for public use.
Problem with allele number in vcf
0
Entering edit mode
3.0 years ago
BAGeno • 130

Hi,

I have vcf of 1000 samples. But I am facing the problem that I have different allele number of every site in vcf. There are some samples which have dots instead of 0 and 1 in genotype column. Can any one please tell me how should in correct the problem of different allele number?

ADD COMMENTlink
1
Entering edit mode

./. is where the caller could not confidently call a genotype. I don't think there is much you can do computationally to address that, unless you had stringent filters set in the current call.

ADD REPLYlink
0
Entering edit mode

Hello BAGeno,

you can do this with bcftools:

$ bcftools +fixploidy input.vcf > fixed.vcf

There are more option available that might be useful. Have a look at:

$ bcftools +fixploidy -h

fin swimmer


EDIT:

I moved my post to an comment. Because I first thought that you have . as genotype and wanted ./.. If so the above solution should work (and I can move my post back to an answer). If you already have ./. in your vcf, than see the comment by Ram what this means.

ADD REPLYlink
0
Entering edit mode

TIL! I did not know this. What does this do exactly?

ADD REPLYlink
0
Entering edit mode

I cannot tell much more than what the help file do:

$ bcftools +fixploidy -h   

About: Fix ploidy
Usage: bcftools +fixploidy [General Options] -- [Plugin Options]
Options:
   run "bcftools plugin" for a list of common options

Plugin options:
   -d, --default-ploidy <int>  default ploidy for regions unlisted in -p [2]
   -f, --force-ploidy <int>    ignore -p, set the same ploidy for all genotypes
   -p, --ploidy <file>         space/tab-delimited list of CHROM,FROM,TO,SEX,PLOIDY
   -s, --sex <file>            list of samples, "NAME SEX"
   -t, --tags <list>           VCF tags to fix [GT]

Example:
   # Default ploidy, if -p not given. Unlisted regions have ploidy 2
   X 1 60000 M 1
   X 2699521 154931043 M 1
   Y 1 59373566 M 1
   Y 1 59373566 F 0
   MT 1 16569 M 1
   MT 1 16569 F 1

   # Example of -s file, sex of unlisted samples is "F"
   sampleName1 M

   bcftools +fixploidy in.vcf -- -s samples.txt

So one can use it for example that male sample have gentotypes on X chromosome like 0 and 1 but females 0/0, 0/1, 1/1.

ADD REPLYlink
0
Entering edit mode

I have ./. in my vcf. Should I remove these calls from my vcf. I have do different population analysis. I did not called variants so I cannot do anything on that step.

ADD REPLYlink
0
Entering edit mode

Should I remove these calls from my vcf.

This mainly depends on what exactly is your goal and how many samples have no calls in regions where that variant was found.

Without knowing this there is no general answer.

fin swimmer

ADD REPLYlink
0
Entering edit mode

I want to do population analysis. whether a certain disease variants is present in the population or not. Also can you please tell me should you I check this?

how many samples have no calls in regions where that variant was found

ADD REPLYlink
0
Entering edit mode

how many samples have no calls in regions where that variant was found

One way is to use gatk VariantsToTable.

Or with awk (inspired by Kevin) :

awk -F"\t" 'BEGIN {print "CHR\tPOS\tID\tREF\tALT\tNoCall"} !/^#/ {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t" gsub(/\.\/\./,"")}' input.vcf

fin swimmer

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1