Biostar Beta. Not for public use.
Plink-GWAS- how to solve heterozygous haploid warning in data cleaning
1
Entering edit mode
6.4 years ago
United States

Hi all,

I am using Plink for GWAS studies. For the data cleaning step, when I am filtering for MAF (minor allele frequency), it is giving me some warnings such as -

"Plink is setting 111834 heterozygous haploid as missing"

From the following link, I found that it could be solved by "--split-x" command but I am finding difficulty to use it. https://www.cog-genomics.org/plink2/data#split_x

It would be great if anyone can give me an example command so, I could get an exact idea of its usage?

Thanks a lot.

PC

ADD COMMENTlink
3
Entering edit mode
17 months ago
United States

If all the heterozygous haploid warnings involve the X chromosome, and your data uses build 37 coordinates,

plink --bfile unclean_fileset --split-x b37 --make-bed --out clean_fileset

should work.

If there also are nonmissing female genotype calls on the Y chromosome, and you're sure there are no gender errors in your .fam file, you can then use

plink --bfile clean_fileset --make-bed --set-hh-missing --out cleaner_fileset

to erase those too.

ADD COMMENTlink
0
Entering edit mode

Thanks a lot for your suggestion.

I have few questions-

1) How would I know that heterozygous haploid warnings are involving X-chromosome and what if I don't know the build of my data?

2) You are using "bfile" in the command, but I haven't converted ped and map file into binary filesets before data cleaning. Should I do that first?

Thanks once again.

PC

ADD REPLYlink
1
Entering edit mode
2.4 years ago
HumeMarx • 20
United Kingdom

Hi

This post was quite useful, thank you. What happens if you have done all that, but still there remains a large number of het haploid genotypes?

I have quite a few heterozygote haploid genotypes. I have removed individuals who failed sex checks, made a new chromosome code for the SNPs in the pseudoautosomal region and addressed the nonmissing nonmale Y genotypes (it actually still gives me the Warning regarding the nonmissing female Y genotypes even though I tried to erase them). I still have over 14000 heterozygote haploid genotypes present. What can I do? Should I just remove these SNPs? Is there anything else that I can do?

p.s. I am trying to address this, as my dataset seems to have an unusually large missingness rate. Setting the --mind to 0.1 which is standard allows only 2% of the samples to pass the QC!!!!!

ADD COMMENTlink
0
Entering edit mode
6.4 years ago
United States

Thanks a lot for your suggestion.

I have few questions-

1) How would I know that heterozygous haploid warnings are involving X-chromosome and what if I don't know the build of my data?

2) You are using "bfile" in the command, but I haven't converted ped and map file into binary filesets before data cleaning. Should I do that first?

Thanks once again.

PC

ADD COMMENTlink
1
Entering edit mode

1. The .hh file has details on the heterozygous haploid warnings. Check if the SNP IDs are on the X chromosome, the Y chromosome, or both.

2. Replace --bfile with --file in the first command; nothing else needs to change.

ADD REPLYlink
0
Entering edit mode

Thanks a lot for your reply.

When I am using this command - 'plink --file unclean_fileset --split-x b37 --make-bed --out clean_fileset'

Plink stops with the following error-

**Unused command line option: --split-x

**Unsed command line option: b37

Do you have any idea about this error?

Thanks

ADD REPLYlink
1
Entering edit mode

--split-x is a new PLINK 1.9 flag; it will not work in 1.07.

ADD REPLYlink
0
Entering edit mode

Hi chrchang523,

I tried plink 1.9 for solving heterozygous haploid warning using the following command-

'plink --file unclean_fileset --split-x b37 --make-bed --out clean_fileset'

Earlier I was getting 103317 heterozygous haploids but now it is decreased to 103205.

But still getting some errors which I have pasted below-

Warning: 103205 het. haploid genotypes present (see HbF_hh_clean.hh).
Warning: Nonmissing nonmale Y chromosome genotype(s) present.
Total genotyping rate is 0.996871.
894327 variants and 254 people pass filters and QC.
Error: --split-x cannot be used when the dataset already contains an XY region.
(Did you mean --merge-x instead?)

Please suggest me to solve it.

Thanks a lot

ADD REPLYlink
0
Entering edit mode

Could you delete this answer make it a comment on https://www.biostars.org/u/9575/ 's answer rather than an answer itself (I can move it to a comment, but only on your original post).

ADD REPLYlink
0
Entering edit mode

Hi, I was able to add it in comment but not able to delete my answer.

I don't see any delete button here.

Thanks.

ADD REPLYlink
0
Entering edit mode

Thanks a lot for your suggestion.

I have few questions-

1) How would I know that heterozygous haploid warnings are involving X-chromosome and what if I don't know the build of my data?

2) You are using "bfile" in the command, but I haven't converted ped and map file into binary filesets before data cleaning. Should I do that first?

Thanks once again.

PC

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1