Is neighbor joining the best approach to look at clustering pattern with population genetic data?
3
0
Entering edit mode
8.0 years ago
su7880 • 0

Hello,

I am working on a big SNP data set from GBS with over 300 individuals from 34 populations. 34 populations compose three closely related species. I tried various assignment tests to see the pop structure but still wanted to see clustering pattern with different approach. Unfortunately, I am not an expert of tree building. For a starter, I am not sure neighbor joining will give me informative inference on relationships among species and populations. Also, there are many heterozygote individuals for many loci since I am using SNPdata set. Which software takes account for ambiguity codes if I do neighbor joining analysis with my SNP data?

Any kind of answers will be very much appreciated.

Thanks in advance.

SNP • 2.5k views
ADD COMMENT
0
Entering edit mode

Hi, It might worth running PCA first on your data. Random Forests are roustabout classifiers, and having reduced the number of your features (SNPs) to manageable size you can build a nice model.

ADD REPLY
0
Entering edit mode
8.0 years ago
su7880 • 0

Thanks for the kind answer.

I already tried PCA and various assignments tests like fastStructure, DAPC. However, I still want to look at what NJ does with the data that I have. Any more suggestions?

ADD COMMENT
0
Entering edit mode
8.0 years ago
Brice Sarver ★ 3.8k

With that kind of data, you'll want to use a distance-based approach like NJ, UPGMA, etc. More sophisticated phylogenetic methods are unlikely to get you an answer in a reasonable amount of time, especially if you're just looking at your data. You can also use a non-phylogenetic method, like hierarchical clustering, if you just want to explore your dataset.

ADD COMMENT
0
Entering edit mode

Hi Brice, out of interest I am wondering why you are not recommending random forests? Are they not outperform other models?

ADD REPLY
1
Entering edit mode

RF might be useful here. I just honed in on NJ, species/population data, and tree. On a second read, perhaps you weren't explicitly talking about using phylogenetic approaches and a dendrogram, as opposed to a phylogeny, will suffice for what you want.

ADD REPLY
0
Entering edit mode
8.0 years ago
su7880 • 0

Thank you all. I am sorry what is RF? Can you give me some more details for that? Also, do you know software that might actually take account IUPAC ambiguity codes for phylogenetic inferences? I just saw an argument that MrBayes might or might not use the IUPAC ambiguity codes. Is RAxML a good? What do you all use for this kind of question?

Many thanks in advance.

ADD COMMENT

Login before adding your answer.

Traffic: 2616 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6