Simultaneous vs sequential marker filters
0
0
Entering edit mode
5.5 years ago
zvezdoma • 0

Hello!

Could somebody please clarify why imposing the marker filters step-by-step vs all at once makes a huge difference? In particular, at first I tried to do the filtering step by step:

  1. Read in bgen to produce pgen: plink2 --bgen <filename.bgen> --sample <filename.sample>. Number of variants: 4,562,905.
  2. Select white British and SNPs only filter: plink2 --pfile <filename> --make-pgen --snps-only --keep wb_sample.txt --out test_chr10_wb_snps. Number of variants: 4,382,572.
  3. Missingness per SNP filter: plink2 --pfile test_chr10_wb_snps --make-pgen --geno 0.05 --out test_chr10_wb_snps_call95. Number of variants: 4,224,711.
  4. Filtering based on minor allele frequency and imputation quality: plink2 test_chr10_wb_snps_call95 --make-pgen --extract chr10_info_maf.txt --out test_chr10_wb_snps_call95_af_info. Number of variants: 1,067,224.
  5. HWE filter: plink2 --pfile test_chr10_wb_snps_call95_af_info --make-pgen --hwe 1e-12 --out test_chr10_wb_snps_call95_af_info_hwe. Number of variants: 1,064,964.
  6. Exclude badly genotyped SNPs: plink2 --pfile test_chr10_wb_snps_call95_af_info_hwe --make-pgen --exclude /disk.0/data/PRS/bad_geno_snps_exclude.txt --out test_chr10_wb_snps_call95_af_info_hwe_g. Number of variants: 1,064,909.

However, next time when I tried to impose those filters all at once (plink2 --pfile <filename> --make-pgen --keep wb_sample.txt --snps-only --extract chr10_info_maf.txt --geno 0.05 --hwe 1e-12 --exclude bad_geno_snps_exclude.txt --out <fileame_filtered>) I get in the end 4,355,904 variants.

I just don't understand why the outputs are so different. I would have thought that in the end, whichever way the filters are imposed, the intersection of all of them remain.

Thank you!

SNP filter order • 885 views
ADD COMMENT

Login before adding your answer.

Traffic: 2442 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6