I have a large set of RNA-seq expression data in a DGEList object, and I want to plot the epxression data between two factors for specific genes temporally.
I started out by subsetting the data into a smaller matrix and then realised that was silly, and I should be able to plot it from the DGEList object that the data is stored in. Each timepoint has three replicates so I would also be looking to take a mean of those replicates before plotting. Would subsetting the data first still be the best option or am I missing a far quicker and easier option.
DGEList Count:
Gene Symbol Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7 Sample8 ...
Gene1 54 55 53 78 79 74 81 82
Gene2 23 21 22 45 44 47 61 62
Gene3 74 75 73 81 82 80 83 88
Gene4 2 3 1 10 9 8 12 11
...
MetaData:
Sample Name ... Day
[1,] Sample1 D0
[2,] Sample2 D0
[3,] Sample3 D0
[4,] Sample4 D3
[5,] Sample5 D3
[6,] Sample6 D3
[7,] Sample7 D7
[8,] Sample8 D7
...
Using the examples above, what I am trying to do is draw an expression line plot for Gene 2 and Gene 3, including averaging the expression levels on each day - but as I have two factors the samples come from two factors and so would need to be separate,
You may take inspiration from this previous question: Boxplot in ggplot2
Alternatively, if you have your data in this format:
... then, you can plot these with:
Thanks for the answer, but not quite what I'm after. I'll update the question above with example data