Dear community,
We are analyzing RNA-seq data from a small time-series experiment with three groups each sampled at a different timepoint (timepoint-0, timepoint-1, timepoint-2) - hence no measurements were repeated on the same individuals. The design is unbalanced with few samples in each group.
To identify genes that vary over time, we encoded the time variable using two dummy variables and applied the likelihood-ratio test (with no coefficient shrinkage). However, for the purpose of visualization, we wanted to show the estimated log2 fold-changes (LFCs) along with their standard errors (SEs) for a model fit using coefficient shrinkage. As the SEs of the LFC estimates of the most significant genes from our test looked quite correlated, we plotted the SEs of the two estimates against each other for all genes and got (red line: intercept 0, slope 1):
The corresponding LFCs (for completeness):
We couldn't come up with an explanation for the strong (and weirdly looking) SE correlation, so we would like to know:
- Is this how it should be? (And out of curiosity, why is this so?)
- Does is have any implications for the validity of our analysis?
Thanks!
How many samples are in each timepoint? When you say "few samples" in each group, is that 1 or five?