Error in PLINK --mind option? ("All people removed due to missing genotype data")
Entering edit mode
3.8 years ago


I just wanted to start with the QC of my data and noticed something very peculiar with the --mind option. For simplicity i first looked only at the very first sample and produced a VCF file. It is fine, with entries in most of the rows. Then i added "--mind 0.01" and it excluded that Sample with the note:

Error: All people removed due to missing genotype data (--mind).

How can that be? How can genotype data be missing if it is clearly possible to produce a VCF file from it?

Here the first call's output:

>Options in effect:
> --bed ukb_cal_chr1_v2.bed
>  --bim ukb_snp_chr1_v2.bim
>  --fam ukb49398_cal_ALL_v2_s488264_fam/ukb49398_cal_chr1_v2_s488264.fam
>  --keep dummy_keep
>  --maf 0.005
>  --recode vcf
>32091 MB RAM detected; reserving 16045 MB for main workspace.
>63487 variants loaded from .bim file.
>488377 people (223467 males, 264797 females, 113 ambiguous) loaded from .fam.
>Ambiguous sex IDs written to plink.nosex .
>--keep: 1 person remaining.
>Before main variant filters, 1 founder and 0 nonfounders present.
>Calculating allele frequencies... done.
>Total genotyping rate in remaining samples is 0.976326.
>52130 variants removed due to minor allele threshold(s)
>11357 variants and 1 person pass filters and QC.
>Note: No phenotypes present.
>--recode vcf to plink.vcf ... done.

And the output with --mind in effect:

>Options in effect:
  --bed ukb_cal_chr1_v2.bed
  --bim ukb_snp_chr1_v2.bim
  --fam ukb49398_cal_ALL_v2_s488264_fam/ukb49398_cal_chr1_v2_s488264.fam
  --keep dummy_keep
  --maf 0.005
  --mind 0.01
  --recode vcf

>32091 MB RAM detected; reserving 16045 MB for main workspace.
>63487 variants loaded from .bim file.
>488377 people (223467 males, 264797 females, 113 ambiguous) loaded from .fam.
>Ambiguous sex IDs written to plink.nosex .
>--keep: 1 person remaining.
>Error: All people removed due to missing genotype data (--mind).
>IDs written to plink.irem .

The problem occurs with PLINK 1.9 and PLINK 2 with slightly different output messages.

Can someone explain this to me?

Thanks in advance!

SNP PLINK • 4.1k views
Entering edit mode
3.8 years ago
zx8754 11k

Total genotyping rate in remaining samples is 0.976326.

It says for that one individual "genotyping rate is 0.97.", meaning missingness is 3%(1-0.97=0.03), so when we apply --mind 0.01, we are saying remove sample with missingness more than 1%, so that one sample gets dropped. Makes sense?

Entering edit mode

Ah, thanks for the clarification! I thought that the 0.01 meant the minimum genotyping rate, not the maximum missingness. But yeah, it makes much more sense that way!


Login before adding your answer.

Traffic: 1027 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6