Question

Semantic Similarity selection in REVIGO: which is better? many clusters or few?

0

Entering edit mode

6.4 years ago

Farbod ★ 3.4k

Hi Biostars,

There are 4 Semantic Similarity methods situated in REVIGO (Resnik, Lin, SimRel, Jiang & Conarth) and as I have checked, they are some how the same (all are node-based).

I am trying to enrich GO terms related to DEGs of two phenotype's brain transcriptome and show them in a REVIGO TreeMap.

Questions:

1- according to my research design, which REVIGO Semantic Similarity method is better?

2- I already have tried all 4 of them to see what visual differences their results have.

Resnik, Lin and SimRel shown about 10 different bunch of colourful squares for different biological processes,

but Jiang & Conarth shown only 4

In this situation, which is better: a crowded TreeMap containing more colour and more square OR one with less squares and limited colours ?

Thanks

go revigo gene ontology • 3.6k views

ADD COMMENT • link 6.4 years ago by Farbod ★ 3.4k

1

Entering edit mode

I doubt there is an easy answer to this. Even if the algorithms might be similar, obviously there is a difference because they yield two different results. I think it's hard to simply say, 10 is better than 4 or the other way around. So, even though this comment might not be useful, I think you have to study the clustering closer and find out what the differences are. (e.g. are the 4 clusters part of the 10 clusters? Or just higher level clusters of the 10? Or are the cluster completely different, which somewhat would lead just to more questions)

ADD REPLY • link 6.4 years ago by LLTommy ★ 1.2k

0

Entering edit mode

Dear @LLTommy. Hi and thank you. I have checked them a little but as I said I could not find any clue for choosing one of them. and I can say that all results (even those with equal squares) have some differences. I even checked many papers that used REVIGO but they usually used the default parameters (why?) or did not mentioned any thing about parameters? It is not even clear that is it a good Idea to put and show the case and control in one TreeMap or we should show them in two separate TreeMaps?

ADD REPLY • link 6.4 years ago by Farbod ★ 3.4k

2

Entering edit mode

Hi. Well, I did not work with RIVIGO before so I can not really help you with the results. What I do know though is that semantic similarity is a tricky topic in itself, therefore I said, it's gonna be hard to just say this algorithm is better (in all cases) than another one. Honestly, if all papers you read simply use default the algorithm and parameters, then I could also imagine that people just did not care/did not want to go down the road of trying to understand the differences of these algorithms. I mean, that is wild guess by me, but ....if that is the case, maybe you are ahead of these other guys already by actually going down the rabbit hole and asking these questions. And again, this is speculation, but if you can show that depending on what semantic similarity algorithm you chose, the results further down your analysis pipeline are very different, you could have some nice results on the side besides your actual research, namely flagging that it's not good enough to just use 'default' parameters. (Which is, in my opinion in general a problem especially in bioinformatics. e.g. everybody is doing machine learning these days, but how many people really know the in and outs of all the algorithms? It's great that we have all these tools that make it easy to do certain analysis, but there is also an (underestimated?) danger in my opinion.

Back to the topic. Let's hope somebody with experience with REVIGO joins the discussion, I would love to hear more about it!

ADD REPLY • link 6.4 years ago by LLTommy ★ 1.2k

0

Entering edit mode

this comment of yours:

is it a good Idea to put and show the case and control in one TreeMap or we should show them in two separate TreeMaps

makes me want to know what your actual input to REVIGO is. I was under the impression that the typical use case would be: DE analysis --> enriched GO terms of those genes, e.g., using goseq --> some list of GO terms plus p-values --> REVIGO

ADD REPLY • link 6.4 years ago by Friederike 8.9k

0

Entering edit mode

@Friederike , Hi.

I have tried conditionA_DEG_GO-terms and conditionB_DEG_GO-terms and conditionA&B_DEG_GO-terms, as input of REVIGO (3 tries).

I have seen the last one in some papers, showing GO of both conditions DEG in one Revigo.

What do you think ? do you have any reference or suggestions about "is it a good Idea to put and show the A and B in one TreeMap or we should show them in two separate TreeMaps"

ADD REPLY • link 6.4 years ago by Farbod ★ 3.4k

2

Entering edit mode

To be honest I have used it for a few tasks and basically, for any GO category, it is advisable to put the GO terms separated as up and down for condition of comparison. So you that one can comment which genes upon up or down-regulation shows specific biological processes or molecular functions in line with your biological hypothesis, especially if brain transcriptomes are from either cell types or lets says diseased and control. Now REVIGO is just a tool that mostly takes ranked list of GO terms based on p.value ranking and project them in semantic space. I would second you to use default parameters and use the GO terms separately, so 2 REVIGO plots for terms enriched by up/down separately. Unless you have common terms which you want to project on to something else. All the 4 should be able to give you more or less similar stuffs apart from the space definition but how much that is inline with the biological hypothesis, needs to be read and a bit digging in pubmed might tell you more. It would also be better if you can post links to the images for us to make a bit more better comments.

ADD REPLY • link 6.4 years ago by ivivek_ngs ★ 5.2k

0

Entering edit mode

As others have noted above, the semantic similarity measure in revigo is just a neat way to summarize/highlight the long list of significantly enriched GO terms. It is a better way to summarize the whole picture of GO analysis rather than listing few top enriched GO terms. The choice of the similarity metric is just an aid and doesn't alter your underlying enriched GO term list. So people tend to go with default and as you have noted there is little difference. It seems that you want to show the similarity/differences in GO enrichment between pheno 1 and 2. For this you can summarize the result as a simple table with (common or different) GO terms as rows and pheno1 and pheno2 as columns. You can use the result in the table that comes with revigo's scatter plot. You can also use biological knowledge to remove redundant terms if your goal is just comparison of two phenotypes.

ADD REPLY • link 6.4 years ago by Diwan ▴ 650

0

Entering edit mode

I would say it depends on the question you want to address.

ADD REPLY • link 6.4 years ago by Friederike 8.9k

0

Entering edit mode

I want to show which GO and/or related pathways are BOLD or enriched in my DEGs.

In this case, which Semantic Similarity you will suggest?

ADD REPLY • link 6.4 years ago by Farbod ★ 3.4k

3

Entering edit mode

You don't need a semantic similarity measure for enrichment analysis. However, if you want to cluster the genes based on how similar their GO annotations are then you need a semantic similarity measure. As already mentioned, the choice of similarity measure and the choice of a clustering algorithm can only be decided in an ad hoc way. One chooses the similarity measure that best captures the notion of similarity one is interested in and the algorithm that seems most suitable for the data at hand. In my experience, Resnik's measure has always been reliable when dealing with gene function annotations. Although the other measures are supposed to fix some of the shortcomings of Resnik's measure, I didn't find them useful. My collaborators and I have compared them for the CMPO ontology in this paper and Resnik's has also been found by others to be among the best when dealing with functional information (see this paper). For an alternative way of using GO, see the paper by Glass and Girvan.

ADD REPLY • link 6.4 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

In that case, you will not need semantic similarity. Even other plotting techniques can just reveal that. I would rather follow what Jean is proposing. There is also cluterProfiler for showing enrichment patterns of GO terms. So right now it is a matter of representation for you as far as I understand rather than using REVIGO and get confused by its type of algorithms you will use.

ADD REPLY • link 6.4 years ago by ivivek_ngs ★ 5.2k