Hi, I am an undergraduate who starts to work with bioinformatics now,so I still newbie. Our group have soil metagenomes sequenced by JGI and we wanna cluster genes which are more associated with each treatment,grouping by gene abundance.Maybe cluster by the genes COG function. I know that this more commonly made with gene expression data,but I wanna know if it is possible to do with DNA sequencing using gene abundance in the genomes. Anyone know any pipeline that can be used to do that? or any package of R that can be used to that something similar what indicspecies do for species,but using genes? Thank you for your help.
You might want to look at hierarchical clustering (heatmap or heatmap.2 from gplots), or do pca analysis.
Thank you for answer.I tried PCA analysis using MG_Rast,but the only output is the PCA graphic and I wanted to know which genes is more related to each treatment.
Try something like this in R:
I made PCA on R,like your suggest.I use the COGs function more abundant ,but the graphic turns out unintelligible.The COGs were to close to each other,so I can`t read them.
You can adjust cex (e.g., cex=0.5), but I am afraid that you wont be able to read all samples or cogs with this plot (or you should buy a very big screen).
I understand.Thank you a lot for the help
You could try the 'phyloseq' package: http://joey711.github.io/phyloseq/plot_ordination-examples.html
Hi,thank you for your answer.I download phyloseq and will test with my dataset.