I have RNA seq Data (n=15) paired with antibody response data(n=15) at day 10. The question that I want to ask is that which genes sets/pathways get most pertuebed at day 10 in repose to vaccination. I get my spearman correlation values between each genes and antibody response at day 10. I wanted to do preranked GSEA with correlation values but the thing is that correlation values for a lot go genes come same which I could expect because there are many genes that should be coexpressed and does not change their expression generally and in response to antibody we expect say couple o 100 genes changes at day 10. so what can be my option because preranked Gsea do not resolve ties. Additionall I have both responders and not responders in my data.
Why do you think it makes sense to do GSEA with correlation values?
Two reasons: 1: I want to identify the genes that could predict the antibody response later in the analysis.
2: I saw people approaching this problem in this way in cell and nature papers
It would be a previlelige to get an answer from you on this if am not thinking correctly through it as I see you are a computational immunologist as well? I also have cases where I have just baseline rna seq data and antibody data at At baseline and 28 days and I want to see different gene sets enriched in it if they correlate well with antibody data. Thanks
I am surprised you have that many values that are the same. Are you removing genes with zero counts and zero for whatever your antibody response metric is?
So here's the thing-- 1- I did not prefiltered my data with low expressing genes- 2- I have two types of vaccines response data --one for pneumococcal vaccines where I have 11 serotypes and influenza vaccines where I have 03 serotypes at min. For pneumococcal vaccines data I have mostly good response data ---have good numbers For influenza vaccines one serotype would show good and rest would show 0s
We usually use sum of log2 expression or Sum FC Log 2 expressions for antibody data-- Do you think both of these are a problem? Not filtering low expressing genes and using sum values of antibody data having 0s?
Yes, I imagine high frequencies of zeros in either dataset will confound the ranking.
Two reasons: 1: I want to identify the genes that could predict the antibody response later in the analysis.
2: I saw people approaching this problem in this way in cell and nature papers
Please use the
Add Reply
orAdd Comment
buttons to add comments rather than submitting them as an answer.