Using PLINK to filter VCF files
4.2 years ago


I am trying to filter a VCF file based on the R2 value that each SNP has. I would like to only keep the entries which have a R2 >= 0.3.

Are these the right commands/procedure?

plink --vcf dataset.dose.vcf ----recode --out dataset.dose #(vcf to map/ped)

plink --file dataset.dose  --indep-pairwise  1345835 5 0.3 --out dataset.dose #(to get a list of SNPs R2 >= 0.3)

plink --file dataset.dose --extract --make-bed --out pruneddata #(to perform the extraction of the SNPs and convert map/ped files to bed/bim/fam)


15 months ago
Republic of Ireland

Just be careful because PLINK assumes that you want to exclude the variants with r-squared >0.3, i.e., the variants that are in linkage disequilibrium. As your intention is to include these variants, you should be extracting variants from the plink.prune.out file.

So, 2 possible ways to get what you want:

  • --extract plink.prune.out
  • --exclude

From the manual:

Variant pruning

--indep <window size="">['kb'] <step size="" (variant="" ct)&gt;="" <vif="" threshold="">

--indep-pairwise <window size="">['kb'] <step size="" (variant="" ct)&gt;="" <r^2="" threshold="">

--indep-pairphase <window size="">['kb'] <step size="" (variant="" ct)&gt;="" <r^2="" threshold="">

These commands produce a pruned subset of markers that are in approximate linkage equilibrium with each other, writing the IDs to (and the IDs of all excluded variants to plink.prune.out).


Also, your window size of 1345835 seems very random.



