Forcing A1/A2 alleles in PLINK in a VCF phased file
1
3
Entering edit mode
6.6 years ago

I have a phased VCF file and I need to change the alleles of the SNPs so that the alleles coded as 0 represent the ancestral state while the alleles coded as 1 represent the derived state.

I got a list of ancestral and derive states for my SNPs in my VCF file and I want to change them using PLINK.

If I have a list of ancestral alleles in a file ancestral_snps.txt , if I use the following command will I get the wanted output?

plink --vcf myvcf.vcf --a1-allele ancestral_snps.txt --recode vcf --out myvcfancestral.vcf

Or is it --allele-a2 ?

I assume that this change won't affect the phasing of my data, right? As I basically only recoded the alleles.

EDIT:

Following Christopher Chang's comment:

If my ancestral_snps.txt file looks like this, specifying which allele is ancestral:

 C rs1001
 G rs1002
 A rs1003

If I use this command in PLINK2, will the ancestral alleles be coded as 0 in my phased VCF?

./plink2 --vcf myvcf.vcf --ref-allele force 1 2 ancestral_snps.txt --recode vcf --out myvcfancestral.vcf

SNP PLINK VCF • 8.2k views
ADD COMMENT
0
Entering edit mode

The command ... --ref-allele force ancestral_snps.txt 1 2 ... do not change heterozygous, just homozygous. When I am changing all the reference alleles in the VCF it is perfect, because the phased haplotypes will be kept. However, if I intend to change the reference allele for a subset of markers in the VCF, this behaviour will destroy the phased haplotypes. To represent what I mean, I give you a short example where only the two first SNPs were changed:

Original file                            New file

REF     ALT    Sample1         |||    REF    ALT Sample1 

 T       G       1|0           |||     G     T     1|0 

 A       C       0|0           |||     C     A     1|1

 T       C       0|1           |||     T     C     0|1   

 T       C       1|1           |||     T     C      1|1

Am I missing any command to force changes in heterozygous ?

ADD REPLY
0
Entering edit mode

Hi,

Can you post the full .log file from your run?

ADD REPLY
0
Entering edit mode

Sure.

PLINK v2.00a1LM AVX2 Intel (11 Feb 2018) Options in effect: --cow --export vcf --maf 0.01 --out teste --ref-allele force reference.teste2 1 2 --vcf CHR9.vcf.gz

Hostname: c070 Working directory: home/VCFs Start time: Thu Oct 11 15:58:01 2018

Random number seed: 1539233881 128719 MB RAM detected; reserving 64359 MB for main workspace. Using up to 20 threads (change this with --threads). --vcf: 1589271 variants scanned. --vcf: teste-temporary.pgen + teste-temporary.pvar + teste-temporary.psam written. 50 samples (0 females, 0 males, 50 ambiguous; 50 founders) loaded from teste-temporary.psam. 1589271 variants loaded from teste-temporary.pvar. Note: No phenotype data present. Calculating allele frequencies... done. 149348 variants removed due to minor allele threshold(s) (--maf/--max-maf/--mac/--max-mac). 1439923 variants remaining after main filters. --ref-allele: 14 sets of allele codes rotated. --export vcf to teste.vcf ... done.

End time: Thu Oct 11 15:58:12 2018

ADD REPLY
0
Entering edit mode

Can you retry this with a more recent build? This appears to be working properly on my end, and I recall fixing this several months ago.

ADD REPLY
0
Entering edit mode

Thanks.

It worked perfectly when I used PLINK v2.00a2LM AVX2.

ADD REPLY
4
Entering edit mode
6.6 years ago
  • If you want to preserve phase information, you'll need plink 2.0; this is impossible in 1.9 due to the limitations of the core file format.

  • "--ref-allele force ancestral_snps.txt" should work. Depending on the layout of the ancestral_snps file, you might need to specify some column numbers too; "plink2 --help ref-allele" provides usage details.

ADD COMMENT
0
Entering edit mode

So if I understand correctly, the --ref-allele force will make the specified (in this case ancestral) alleles be changed to 0? I am checking the help page, but I am not sure what this means: {refcol} {IDcol} {skip} ? What column numbers would I need to specify? Is it from the VCF or the ancestral_snps.txt file? I made an edit to my original answer.

ADD REPLY
0
Entering edit mode

refcol is the column number of the alleles you want to treat as reference; in this case, 1.

IDcol is the variant ID column number; in this case, 2.

ADD REPLY
0
Entering edit mode

So then: ... --ref-allele force ancestral_snps.txt 1 2 .... Would this work? Thanks again.

ADD REPLY
1
Entering edit mode

Yes, that should work.

ADD REPLY
0
Entering edit mode

Thanks - that worked!

ADD REPLY
0
Entering edit mode

I just noticed that after using --ref-allele force some of the genotypes that were UNPHASED appear as PHASED in the new vcf file. Is there any way to avoid this from happening?

ADD REPLY
1
Entering edit mode

No, because it's irrelevant. You noted in your plink2-users post that this only affected homozygous genotypes, which don't have a phasing state at all. plink2 renders homozygous genotypes in VCFs with '|' iff the previous heterozygous genotype was phased.

(There are plans to add support for VCF "phase sets" in the future, but for now, you should stick to bcftools when it's important to preserve that information.)

ADD REPLY

Login before adding your answer.

Traffic: 2036 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6