melt the data

Question

Network/Pathway Analysis from Mass Spec data

1

Entering edit mode

5.6 years ago

Mozart ▴ 330

Hello there, a colleague asked me to analyse some mass spec data for him. the idea is to do the analysis of pathways/networks to get some results. I have a really tiny experience on transcriptomics and thus I am not sure if the term "score" and "emPAI" can be used as a "p value" and a "fold-change", respectively, as reported below:

Family  - Member - Database - Accession - Score - Mass - Num. of matches - Num. of significant matches - Num. of sequences - Num. of significant sequences - emPAI - Description

we have samples from 2 different tissues at different time points of infection.

Any help?

Kind regards, Mozart

proteomics • 2.9k views

ADD COMMENT • link 5.5 years ago by Mozart ▴ 330

1

Entering edit mode

Hey Mozart, still playing good music with that violin?

What is the source of your mass spec data, do you know?

ADD REPLY • link 5.6 years ago by Kevin Blighe 87k

0

Entering edit mode

Hey Kevin, glad to see you there...yeah I am keep doing amazing stuff with this piece of wood...!!!

so I am not entirely sure I got it right but probably you mean some extra information about the machine and like...so:

Mass spectrometer is a Thermo Velos Pro

Sample time run was 1hr

reaction volume: 10µl (desalted ziptip) buffer used: 5% acetonitrile; 0.1% formic acid

does it help??

thanks!

ADD REPLY • link 5.6 years ago by Mozart ▴ 330

1

Entering edit mode

Okay, thank you! Is it not proteomics data that you have? Can you paste a few rows of the data? You could also plot as histogram of the data to see how it looks. I do not know which fields / columns contain the expression values.

ADD REPLY • link 5.6 years ago by Kevin Blighe 87k

0

Entering edit mode

thanks to you, Kevin!

so what we got is:

Family (149) Member (1) Database (SwissProt) Accession (KDM4A_MOUSE) Score (18) Mass (3933362) Num.of matches (534) Num.of significant matches (1) Num of sequences (544) Num of significant sequences (1) emPAI (100) Description (Lysine-specific demethylase 4A OS=Mus musculus GN=Kdm4a PE=2 SV=6)

hope it helps...I have no idea of the information included in the Description section, I am afraid. Neither the guy here, who provides the data, seems to be very convincing to me.

ADD REPLY • link 5.6 years ago by Mozart ▴ 330

1

Entering edit mode

Thanks. It is making more sense, now. The emPAI is a relative measure of protein expression. Please see here: Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein.

I presume that you can use the emPAI values for analysis. Usually, it would be, for example, a Mann-Whitney test (non-parametric). Can you plot the distribution of the emPAI values?

ADD REPLY • link 5.6 years ago by Kevin Blighe 87k

0

Entering edit mode

thanks a lot for your (very much appreciated) help, Kevin. You are the best...

enter image description here

hope that make any sense

ADD REPLY • link 5.6 years ago by Mozart ▴ 330

0

Entering edit mode

Is there only one column of values for emPAI? - you mentioned 2 tissue at different time-points (?)

ADD REPLY • link 5.6 years ago by Kevin Blighe 87k

0

Entering edit mode

Sorry, I didn't get it!

I actually have just one tissue at different time points at the moment...but, I mean, I just want to know the time effects in these two systems. so I guess it is the same identical thing. time 0

1 day

2 days

so time 0 1 day 2 days, respectively

ADD REPLY • link 5.6 years ago by Mozart ▴ 330

0

Entering edit mode

I posted an answer. Forse aiutará! / Maybe it will help!

ADD REPLY • link 5.6 years ago by Kevin Blighe 87k

0

Entering edit mode

I don't really know how big was your help...as usual you solved my problem in a matter of secs; you are incredible Kevin!!

Thanks a lot, I guess that's all I need..

I will dedicate one of my composition to you! ahahah

Thanks again.

ADD REPLY • link 5.5 years ago by Mozart ▴ 330

0

Entering edit mode

Anything for the magnificent Mozart!

ADD REPLY • link 5.5 years ago by Kevin Blighe 87k

score 9 · Accepted Answer · 2018-10-02

I see... aspetta... if you can get your data like this, then you can do different tests:

data
      Day Protein1 Protein2 Protein3 Protein4 Protein5 Protein6
     Day1     0.01     0.04     0.05     0.06     0.34     0.04
     Day2     0.05     0.45     0.06     0.04     0.03     0.04
     Day3     0.56     0.07     0.34     0.56     0.05     0.04

melt the data

require(reshape2)
data.melt <- melt(data)
colnames(data.melt) <- c("Day", "Protein", "Expression")
data.melt$Day <- factor(data.melt$Day, levels = c("Day1","Day2","Day3"))

stat test

Then, perform a non-parametric Kruskal-Wallis and Dunn's Test to compare the time-points. This test determines if there exists a global difference across the time-points.

require(FSA)
dunnTest(Expression ~ Day, data = data.melt, method = 'bh')

Dunn (1964) Kruskal-Wallis multiple comparison
  p-values adjusted with the Benjamini-Hochberg method.

   Comparison          Z    P.unadj     P.adj
  Day1 - Day2 -0.1371082 0.89094528 0.8909453
  Day1 - Day3 -1.6727199 0.09438245 0.2831473
  Day2 - Day3 -1.5356117 0.12463364 0.1869505

plot data per protein

    require(ggplot2)

ggplot(data.melt, aes(x=Day, y=Expression)) +

    stat_summary(geom = "bar", fill="forestgreen", fun.y = mean, position = "dodge") +

    facet_wrap(~ Protein, ncol = 3) +

    stat_summary(geom = "errorbar", fun.data = mean_se, position = "dodge") +

    xlab("") +

    ylab("emPAI") +

    scale_y_continuous(breaks = round(seq(0, max(data.melt$Expression), by = 0.1), 1)) +

    #geom_text(size=4, y=-20) +

    #Set the size of the plotting window
    theme_bw(base_size=24) +

    #Modify various aspects of the plot text and legend
    theme(
        legend.position='none',
        legend.background=element_rect(),
        plot.title=element_text(angle=0, size=18, face="bold", vjust=1),

        axis.text.x=element_text(angle=90, size=18, face="bold", hjust=1.10),
        axis.text.y=element_text(angle=0, size=18, face="bold", vjust=0.5),
        axis.title=element_text(size=18, face="bold"),

        #Legend
        legend.key=element_blank(),     #removes the border
        legend.key.size=unit(1, "cm"),      #Sets overall area/size of the legend
        legend.text=element_text(size=18),  #Text size
        title=element_text(size=18),

        #facet wrap
        strip.text.x = element_text(size=18),
        strip.text.y = element_text(size=18)) +

        #Change the size of the icons/symbols in the legend
        guides(colour=guide_legend(override.aes=list(size=2.5)))

calculate ratios across time-points

tdata <- t(data[,2:ncol(data)])
tdata
         Day1 Day2 Day3
Protein1 0.01 0.05 0.56
Protein2 0.04 0.45 0.07
Protein3 0.05 0.06 0.34
Protein4 0.06 0.04 0.56
Protein5 0.34 0.03 0.05
Protein6 0.04 0.04 0.04

ratios <- t(apply(tdata, 1, function(x) c(
  x[1] / x[2],
  x[2] / x[3],
  x[1] / x[3])))
colnames(ratios) <- c("Day1vs2","Day2vs3","Day1vs3")
ratios
             Day1vs2    Day2vs3    Day1vs3
Protein1  0.20000000 0.08928571 0.01785714
Protein2  0.08888889 6.42857143 0.57142857
Protein3  0.83333333 0.17647059 0.14705882
Protein4  1.50000000 0.07142857 0.10714286
Protein5 11.33333333 0.60000000 6.80000000
Protein6  1.00000000 1.00000000 1.00000000

----------------------------------------

It would be difficult to get a single p-value for each protein because there are only 3 values. Ratios make more sense, in this case.

Kevin