Biostar Beta. Not for public use.
Using PLINK to filter VCF files
0
Entering edit mode
4.2 years ago

Hello,

I am trying to filter a VCF file based on the R2 value that each SNP has. I would like to only keep the entries which have a R2 >= 0.3.

Are these the right commands/procedure?

plink --vcf dataset.dose.vcf ----recode --out dataset.dose #(vcf to map/ped)

plink --file dataset.dose  --indep-pairwise  1345835 5 0.3 --out dataset.dose #(to get a list of SNPs R2 >= 0.3)

plink --file dataset.dose --extract plink.prune.in --make-bed --out pruneddata #(to perform the extraction of the SNPs and convert map/ped files to bed/bim/fam)

Diego

GWAS PLINK vcf • 2.0k views
ADD COMMENTlink
0
Entering edit mode
15 months ago
Republic of Ireland

Just be careful because PLINK assumes that you want to exclude the variants with r-squared >0.3, i.e., the variants that are in linkage disequilibrium. As your intention is to include these variants, you should be extracting variants from the plink.prune.out file.

So, 2 possible ways to get what you want:

  • --extract plink.prune.out
  • --exclude plink.prune.in

From the manual:

Variant pruning

--indep <window size="">['kb'] <step size="" (variant="" ct)&gt;="" <vif="" threshold="">

--indep-pairwise <window size="">['kb'] <step size="" (variant="" ct)&gt;="" <r^2="" threshold="">

--indep-pairphase <window size="">['kb'] <step size="" (variant="" ct)&gt;="" <r^2="" threshold="">

These commands produce a pruned subset of markers that are in approximate linkage equilibrium with each other, writing the IDs to plink.prune.in (and the IDs of all excluded variants to plink.prune.out).

[source: https://www.cog-genomics.org/plink/1.9/ld]

Also, your window size of 1345835 seems very random.

Kevin

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1