Method to cluster genes based on gene abundance in metagenomic data

0

Entering edit mode

8.1 years ago

nayara • 0

Hi, I am an undergraduate who starts to work with bioinformatics now,so I still newbie. Our group have soil metagenomes sequenced by JGI and we wanna cluster genes which are more associated with each treatment,grouping by gene abundance.Maybe cluster by the genes COG function. I know that this more commonly made with gene expression data,but I wanna know if it is possible to do with DNA sequencing using gene abundance in the genomes. Anyone know any pipeline that can be used to do that? or any package of R that can be used to that something similar what indicspecies do for species,but using genes? Thank you for your help.

cluster genome annotation R metagenome • 2.4k views

ADD COMMENT • link updated 8.1 years ago by Devon Ryan 104k • written 8.1 years ago by nayara • 0

1

Entering edit mode

You might want to look at hierarchical clustering (heatmap or heatmap.2 from gplots), or do pca analysis.

ADD REPLY • link 8.1 years ago by Benn 8.3k

0

Entering edit mode

Thank you for answer.I tried PCA analysis using MG_Rast,but the only output is the PCA graphic and I wanted to know which genes is more related to each treatment.

ADD REPLY • link 8.1 years ago by nayara • 0

1

Entering edit mode

Try something like this in R:

pca<-princomp(exprs_data)
plot(pca)
biplot(pca)

ADD REPLY • link 8.1 years ago by Benn 8.3k

0

Entering edit mode

I made PCA on R,like your suggest.I use the COGs function more abundant ,but the graphic turns out unintelligible.The COGs were to close to each other,so I can`t read them.

ADD REPLY • link 8.1 years ago by nayara • 0

0

Entering edit mode

You can adjust cex (e.g., cex=0.5), but I am afraid that you wont be able to read all samples or cogs with this plot (or you should buy a very big screen).

ADD REPLY • link 8.1 years ago by Benn 8.3k

0

Entering edit mode

I understand.Thank you a lot for the help

ADD REPLY • link 8.1 years ago by nayara • 0

1

Entering edit mode

You could try the 'phyloseq' package: http://joey711.github.io/phyloseq/plot_ordination-examples.html