Problem with allele number in vcf
0
0
Entering edit mode
5.7 years ago
BAGeno ▴ 190

Hi,

I have vcf of 1000 samples. But I am facing the problem that I have different allele number of every site in vcf. There are some samples which have dots instead of 0 and 1 in genotype column. Can any one please tell me how should in correct the problem of different allele number?

vcf allele number genotype • 2.4k views
ADD COMMENT
1
Entering edit mode

Hello BAGeno,

you can do this with bcftools:

$ bcftools +fixploidy input.vcf > fixed.vcf

There are more option available that might be useful. Have a look at:

$ bcftools +fixploidy -h

fin swimmer


EDIT:

I moved my post to an comment. Because I first thought that you have . as genotype and wanted ./.. If so the above solution should work (and I can move my post back to an answer). If you already have ./. in your vcf, than see the comment by Ram what this means.

ADD REPLY
0
Entering edit mode

TIL! I did not know this. What does this do exactly?

ADD REPLY
0
Entering edit mode

I cannot tell much more than what the help file do:

$ bcftools +fixploidy -h   

About: Fix ploidy
Usage: bcftools +fixploidy [General Options] -- [Plugin Options]
Options:
   run "bcftools plugin" for a list of common options

Plugin options:
   -d, --default-ploidy <int>  default ploidy for regions unlisted in -p [2]
   -f, --force-ploidy <int>    ignore -p, set the same ploidy for all genotypes
   -p, --ploidy <file>         space/tab-delimited list of CHROM,FROM,TO,SEX,PLOIDY
   -s, --sex <file>            list of samples, "NAME SEX"
   -t, --tags <list>           VCF tags to fix [GT]

Example:
   # Default ploidy, if -p not given. Unlisted regions have ploidy 2
   X 1 60000 M 1
   X 2699521 154931043 M 1
   Y 1 59373566 M 1
   Y 1 59373566 F 0
   MT 1 16569 M 1
   MT 1 16569 F 1

   # Example of -s file, sex of unlisted samples is "F"
   sampleName1 M

   bcftools +fixploidy in.vcf -- -s samples.txt

So one can use it for example that male sample have gentotypes on X chromosome like 0 and 1 but females 0/0, 0/1, 1/1.

ADD REPLY
0
Entering edit mode

I have ./. in my vcf. Should I remove these calls from my vcf. I have do different population analysis. I did not called variants so I cannot do anything on that step.

ADD REPLY
0
Entering edit mode

Should I remove these calls from my vcf.

This mainly depends on what exactly is your goal and how many samples have no calls in regions where that variant was found.

Without knowing this there is no general answer.

fin swimmer

ADD REPLY
0
Entering edit mode

I want to do population analysis. whether a certain disease variants is present in the population or not. Also can you please tell me should you I check this?

how many samples have no calls in regions where that variant was found

ADD REPLY
0
Entering edit mode

how many samples have no calls in regions where that variant was found

One way is to use gatk VariantsToTable.

Or with awk (inspired by Kevin) :

awk -F"\t" 'BEGIN {print "CHR\tPOS\tID\tREF\tALT\tNoCall"} !/^#/ {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t" gsub(/\.\/\./,"")}' input.vcf

fin swimmer

ADD REPLY
1
Entering edit mode

./. is where the caller could not confidently call a genotype. I don't think there is much you can do computationally to address that, unless you had stringent filters set in the current call.

ADD REPLY

Login before adding your answer.

Traffic: 2676 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6