I need to use SHAPEIT for phasing only since I will conduct CH (compound heterozygous) analysis for recessive rare variant.. I will not perform imputation.
I am running SHAPEIT, and I see in the log file it says:
- Seed : 1442251531
- Parallelisation: 12 threads
* Ref allele is NOT aligned on the reference genome
- MCMC: 35 iterations [7 B + 1 runs of 8 P + 20 M]
I am still able to get *haps file for haplotypes for CH, however, I am not sure if I am doing correctly.
Is it ok to have " Ref allele is NOT aligned on the reference genome " notice on my log file?...
I have one more question..
My input file is plink PED/MAP format, and on the SHAPEIT website (http://shapeit.fr/pages/m03_phasing/input.html), it says that SHAPEIT considers "0" as missing data.
And they suggested people to change the missing data character to "N" for example, use _--missing-code _options as follows:
shapeit --input-ped chr20.unphased.ped chr20.unphased.map -M chr20.gmap.gz --output-max chr20.phased **--missing-code N**
However, --missing-code N gives me an error "ERROR: Non biallelic site pos=24118582 a=0"
So, I did not use --missing-code N and run SHAPEIT:
shapeit --input-ped chr20.unphased.ped chr20.unphased.map -M chr20.gmap.gz --output-max chr20.phased
Would that be ok?
Thank you so much,