Question: Using PLINK to filter VCF files
0
Entering edit mode

Hello,

I am trying to filter a VCF file based on the R2 value that each SNP has. I would like to only keep the entries which have a R2 >= 0.3.

Are these the right commands/procedure?

plink --vcf dataset.dose.vcf ----recode --out dataset.dose #(vcf to map/ped)

plink --file dataset.dose  --indep-pairwise  1345835 5 0.3 --out dataset.dose #(to get a list of SNPs R2 >= 0.3)

plink --file dataset.dose --extract plink.prune.in --make-bed --out pruneddata #(to perform the extraction of the SNPs and convert map/ped files to bed/bim/fam)

Diego

ADD COMMENTlinkeditmoderate 3.6 years ago Diego.Morales • 10 • updated 10 months ago Kevin Blighe 43k
0
Entering edit mode

Just be careful because PLINK assumes that you want to exclude the variants with r-squared >0.3, i.e., the variants that are in linkage disequilibrium. As your intention is to include these variants, you should be extracting variants from the plink.prune.out file.

So, 2 possible ways to get what you want:

  • --extract plink.prune.out
  • --exclude plink.prune.in

From the manual:

Variant pruning

--indep <window size="">['kb'] <step size="" (variant="" ct)&gt;="" <vif="" threshold="">

--indep-pairwise <window size="">['kb'] <step size="" (variant="" ct)&gt;="" <r^2="" threshold="">

--indep-pairphase <window size="">['kb'] <step size="" (variant="" ct)&gt;="" <r^2="" threshold="">

These commands produce a pruned subset of markers that are in approximate linkage equilibrium with each other, writing the IDs to plink.prune.in (and the IDs of all excluded variants to plink.prune.out).

[source: https://www.cog-genomics.org/plink/1.9/ld]

Also, your window size of 1345835 seems very random.

Kevin

ADD COMMENTlinkeditmoderate 10 months ago Kevin Blighe 43k

Login before adding your answer.

Powered by the version 2.0