4.3 years ago
I had SNPs for 39 WES samples, some of them were from related individuals.
I wanted to check the kinship to see if there any mislabelling during the sample processing.
Finally I've build a nice tree with the code below.
It was validated with independent observations (family diagrams, ancestry).
All the unrelated individuals were connected above the FC (first cousins) line,
all sibs, half-sibs, and other relatives were where they should be.
The crucial steps were using IBS function to calculate distances and taking LD into account.
Without these two I got just misleading trees.
The default LD threshold (0.2) removed too many SNPs, I increased it to 0.5 to achieve higher sensitivity.
LD filtation reduced 500K SNPs to 16K.
#install SNPRelate as described here:
#prepare multisample vcf with bcftools merge
setwd([your dir here])
#biallelic by default
genofile = snpgdsOpen("dataset1.gds")
#LD based SNP pruning
snpset = snpgdsLDpruning(genofile,ld.threshold = 0.5)
# distance matrix - use IBS
dissMatrix = snpgdsIBS(genofile , sample.id=NULL, snp.id=snp.id, autosome.only=TRUE,
remove.monosnp=TRUE, maf=NaN, missing.rate=NaN, num.thread=2, verbose=TRUE)
snpHCluster = snpgdsHCluster(dissMatrix, sample.id=NULL, need.mat=TRUE, hang=0.01)
cutTree = snpgdsCutTree(snpHCluster, z.threshold=15, outlier.n=5, n.perm = 5000, samp.group=NULL,
col.outlier="red", col.list=NULL, pch.outlier=4, pch.list=NULL,label.H=FALSE, label.Z=TRUE,
snpgdsDrawTree(cutTree, main = "Dataset 1",edgePar=list(col=rgb(0.5,0.5,0.5,0.75),t.col="black"),
I hope this will be helpful for somebody.