Graphing specific gene expression data from DGEList
1
1
Entering edit mode
6.0 years ago

I have a large set of RNA-seq expression data in a DGEList object, and I want to plot the epxression data between two factors for specific genes temporally.

I started out by subsetting the data into a smaller matrix and then realised that was silly, and I should be able to plot it from the DGEList object that the data is stored in. Each timepoint has three replicates so I would also be looking to take a mean of those replicates before plotting. Would subsetting the data first still be the best option or am I missing a far quicker and easier option.

DGEList Count:

Gene Symbol   Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7 Sample8 ...
Gene1           54     55       53      78      79      74      81      82
Gene2           23     21       22      45      44      47      61      62     
Gene3           74     75       73      81      82      80      83      88
Gene4            2      3        1      10       9       8      12      11
...

MetaData:

       Sample Name ...    Day
[1,]    Sample1            D0
[2,]    Sample2            D0
[3,]    Sample3            D0
[4,]    Sample4            D3
[5,]    Sample5            D3
[6,]    Sample6            D3
[7,]    Sample7            D7
[8,]    Sample8            D7
...

Using the examples above, what I am trying to do is draw an expression line plot for Gene 2 and Gene 3, including averaging the expression levels on each day - but as I have two factors the samples come from two factors and so would need to be separate,

EdgeR RNA-Seq DGEList ggplot2 • 3.1k views
ADD COMMENT
0
Entering edit mode

You may take inspiration from this previous question: Boxplot in ggplot2

Alternatively, if you have your data in this format:

MyData
             Group   BRCA1   TP53    ATM   CCND1
    Sample1  FactorX -       -       -     -
    Sample2  FactorX -       -       -     -
    Sample3  FactorY -       -       -     -
    Sample4  FactorX -       -       -     -
    Sample5  FactorY -       -       -     -
    ...      ...     ...     ...     ...   ...

... then, you can plot these with:

boxplot(BRCA1 ~ Group, data=MyData)
ADD REPLY
0
Entering edit mode

Thanks for the answer, but not quite what I'm after. I'll update the question above with example data

ADD REPLY
3
Entering edit mode
6.0 years ago

Okay, I get the feeling that this does not have to be anything special for now (in terms of a 'polished' plot). So, you could try this:

df
        time gene1 gene2 gene3
sample1 day1     1     2     3
sample2 day1     4    10     3
sample3 day2     1     2     3
sample4 day2     1     2     3
sample5 day3     1     2     3
sample6 day3     1     2     3

Summarise by mean:

df <- aggregate(df[,2:ncol(df)], df[1], mean)
df
  time gene1 gene2 gene3
1 day1   2.5     6     3
2 day2   1.0     2     3
3 day3   1.0     2     3

Plot

plot(1, type="n", ylab="Expression", xlab="Day (1, 2, 3)", xlim=c(1,3), ylim=c(0,10))
lines(gene1 ~ time, data=df, lwd=2, col="royalblue")
lines(gene2 ~ time, data=df, lwd=2, col="red4")
lines(gene3 ~ time, data=df, lwd=2, col="forestgreen")

d

ADD COMMENT
0
Entering edit mode

Ah awesome! That's exactly what I have been trying to get. Did not know about the aggregate function. Thanks so much! The idea is to get a simple plot of expression with nothing too complicated.

ADD REPLY
1
Entering edit mode

Okay, great!

The second argument that i've used for aggregate is a bit weird, though: df[1] indicates 'aggregate based on the first column. It does not follow the typical data-frame sub-setting

ADD REPLY
1
Entering edit mode

Looks like GROUP BY:

SELECT AVG(df[,2]), AVG(df[,3]), AVG(df[,4]), ...., AVG(df[,ncol(df)]) GROUP BY df([,1])

So, apply a specified aggregate over a vector of vectors, partitioning/grouping each vector based on unique values from a different, equal-length vector.

ADD REPLY
0
Entering edit mode

I managed to figure that out. Thanks again!

Any chance you could explain how ggplot2 could be used using the strings D0, D2, D4, D6 in column one from two dataframes to plot a similar plot as above, but with the discrete variables?

I managed to figure out how to get one line in, but it is also putting the x variable in a different order to that of the dataframe.

ADD REPLY
0
Entering edit mode

@Ram, thanks! With both contributions I have managed to figure out the aggregation of the data

ADD REPLY

Login before adding your answer.

Traffic: 1889 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6