For multiple testing correction in GO analysis, is it ok to remove GO terms with only 1 gene hit?
1
0
Entering edit mode
5.5 years ago
chnyale • 0

I am doing a GO analysis for my gene sets and plan to implement the Benjamini-Hochberg method to adjust the resulted pValues for multiple testing correction. Since the BH method depends on the total number of testing or pValues calculated, I wonder if it is ok or not to remove all GO terms with only 1 gene hits (or those with 1 or 2 gene hits) before calculating the pValues? In that way, the total number of pValues will be reduces, which may produce more significant adjust pValues. The logic is that the GO terms with just 1 or 2 genes hits are more likely not to be significant.

So my plan is like this:

  1. Find out how many GO terms are included in my gene sets
  2. Remove those GO terms with just 1 or 2 gene hits
  3. Calculate enrichment pValues for the rest GO terms, and the total number of testing will be equal to the GO terms with >2 gene hits
  4. Use BH method to adjust pValues

Is this procedure ok or not? Are there any published papers with similar procedures? Any comments or references will be appreciated. Thank you!

gene multiple tesing GO analysis • 3.3k views
ADD COMMENT
0
Entering edit mode
5.5 years ago
EagleEye 7.5k

Hi,

It is not ideal to remove those while performing enrichment analysis. But later when you are filtering GO terms, you may consider FDR parameter along with number of genes/hits to be in the term as filter criteria. But when you are calculating FDR, it must contain all the hits and their pvalues, otherwise you create a bias in your analysis. You can have a look at these articles where I considered P-value cutoff along with minimum number of genes in each term as cutoff to filter the terms.

Articles:

https://www.nature.com/articles/s41467-018-03265-1#Sec15

https://academic.oup.com/nar/article/46/18/9384/5053167#122402618

https://clinicalepigeneticsjournal.biomedcentral.com/articles/10.1186/s13148-016-0274-6#Sec2

ADD COMMENT
0
Entering edit mode

It looks like you used a threshold of at least 5 genes of 5% of a pathway. How did you decide on those thresholds? Do you have a reference for that minimum size?

Thanks!

ADD REPLY
0
Entering edit mode

Hi amandastahlke,

I have used the p-value cutoff. Just to make it more stringent, I have added one more layer of the cutoff. No rule was applied.

ADD REPLY

Login before adding your answer.

Traffic: 2336 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6