Question

Community detection in R thanks to the igraph package

0

Entering edit mode

4.9 years ago

pablo ▴ 300

I have a data.frame with the correlations between OTUs and genes. These correlations will allow me to construct genomes. This data.frame has 1105854 rows.

      var1                var2  corr
1  OTU3978 UniRef90_A0A010P3Z8 0.846
2  OTU4011 UniRef90_A0A010P3Z8 0.855
3  OTU4929 UniRef90_A0A010P3Z8 0.829
4  OTU4317 UniRef90_A0A011P550 0.850
5  OTU4816 UniRef90_A0A011P550 0.807
6  OTU3902 UniRef90_A0A011QPQ2 0.836
7  OTU3339 UniRef90_A0A011RKI6 0.835
8  OTU1359 UniRef90_A0A011RLA7 0.801
9  OTU2085 UniRef90_A0A011RLA7 0.843
10 OTU3542 UniRef90_A0A011RLA7 0.866
11 OTU0473 UniRef90_A0A011TDE1 0.807

I use the igraph library to build a graph object.

g<-graph.data.frame(df)

Then, I want to extract components of this graph in order to construct genomes : I mean, one component will correspond to one genome.

I tried this command : genomes<-split(names(V(g)), components(g)$membership)

It gives me back several components, for example :

> genomes[[4]]
[1] "OTU2417"             "UniRef90_A0A076H0Q4" "UniRef90_A0A2E8T3F8"
[4] "UniRef90_G5ZY43"

I check the OTU and the different genes of each component thanks to my OTUs table and thanks to the EMBL-EBI database for the genes. I can determine if each reconstructed genome is meaningful.

I also checked the documentation, and I found many other community detection methods : edge-betweenness, louvain, multi-level ... I would like to know what is the main difference between the command line I used ( which gives me back pretty meaningful components) and these algorithms (which also give me components) ?

Thanks

r network community igraph • 1.7k views

ADD COMMENT • link updated 4.9 years ago by Jean-Karim Heriche 27k • written 4.9 years ago by pablo ▴ 300

score 2 · Accepted Answer · 2019-06-12

2

Entering edit mode

4.9 years ago

Jean-Karim Heriche 27k

The igraph components() function gives you the connected components of the graph, i.e. all the subgraphs that are not connected to each other. The other methods are clustering algorithms. They will partition the graph by removing edges according to criterion specific to each algorithm. They are normally better applied to each connected component separately (otherwise, many will just output the connected components).

ADD COMMENT • link 4.9 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Could the connected components (subgraph) of the graph give me back reconstructed genomes ? In your opinion, I should first extract each subgraph, and then, apply clustering algorithms on each of them in order to "improve" the exactness of the reconstructed genomes?

ADD REPLY • link 4.9 years ago by pablo ▴ 300

0

Entering edit mode

In your context, connected components represent groups whose members have no correlation with members of any of the other groups. Whether that meets your requirements for calling a group a genome is for you to decide. However if you want to further partition the connected components (for example you think they represent more than one genome) then you can apply a clustering algorithm to try and reveal further structure. My point was that applying almost any clustering algorithm to the whole graph is pointless because this will return the connected components.

ADD REPLY • link 4.9 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Thanks for your reply. The biggest component I get is always the first one (whatever the dataset I import) . So I am going to apply clustering on this one.

I use : genomes<-split(names(V(g)), components(g)$membership) , and I extract the first component with big_one<-genomes[[1]] .

Is there a way to get back an igraph object only for this component?

ADD REPLY • link 4.9 years ago by pablo ▴ 300

1

Entering edit mode

Check the decompose() function.

ADD REPLY • link 4.9 years ago by Jean-Karim Heriche 27k