Hello,
I am trying to run DAPC analysis in my genome-wide dataset incluiding 188736 genotypes for 188 individuals from 18 different geographic populations. I already know there is some genetic structure in the dataset, at least 2 groups could be defined. However, when running "find.clusters()" function in order to define the most plausible number of groups that could explain my dataset I obtain strange plots of "Cumulative variance explained by PCA" and "Value of BIC vs number of clusters":
This is my R script:
library("adegenet")
snps <- read.PLINK(file = "file.raw", map.file = "file.map")
grp <- find.clusters(snps, max.n.clust = 36, n.iter=1000)
dapc1 <- dapc(snps, grp$grp)
scatter(dapc1)
When I perform the analysis with a small subset of my data, let's say 1000 SNPs, the analysis seems to run and I obtain normal plots, however a subset would not be representative of my dataset to perform this kind of analysis. That's why I was wondering if it could be an issue with the size of the dataset.
Do you have any idea about why obtaining these results and what do they actually mean? Could it be because the amount of genotypes and samples is such high that the function cannot work with them?
Thanks a lot in advance.
André
Hi André,
I am having the same problem with my data. Did you figure out if was something wrong with your data or sth about the script?
Thank you.
Sandara Brasil.