How to plot correlation graphs with R^2 ?
1
2
Entering edit mode
5.8 years ago
WUSCHEL ▴ 750

I have a proteomics data matrix. In the data matrix, I have detected a different number of peptides for each protein (detectable peptides numbers vary on the protein).

Q1. How can I plot correlation graphs for each protein(gene) to compare how its' peptides behave. i.e. For protein A, I have peptides a1-a3, I want to compare a1 vs a2, a1 vs a3, and a2 vs a3.

Sample data

structure(list(Protein = c("A", "A", "A", "A", "B", "C", "C", "D", "D", "D"), Peptide = c("a1", "a2", "a3", "a4", "b1", "c1", "c2", "d1", "d2", "d3"), Sample1 = c(0.275755732, 0.683048798, 1.244604878, 0.850270313, 0.492175199, 0.269651338, 0.393004954, 0.157966662, 1.681672581, 0.298308801), Sample2 = c(0.408992244, 0.172488244, 1.749247694, 0.358172308, 0.142129982, 0.158636283, 0.243500648, 0.095019037, 0.667928805, 0.572162278), Sample3 = c(0.112265765, 0.377174168, 2.430040623, 0.497873323, 0.141136584, 0.250330266, 0.249783164, 0.107188279, 0.173623439, 0.242298602), Sample4 = c(0.87688073, 0.841826338, 0.831376575, 0.985900966, 0.891632525, 1.016533723, 0.292048735, 0.776351689, 0.800070173, 1.161882923), Sample5 = c(1.034093889, 0.304305772, 0.616445765, 1.000820463, 1.03124071, 0.995897846, 0.289542364, 0.578721727, 0.672592766, 1.168944588), Sample6 = c(1.063124715, 0.623917522, 0.613196611, 0.990921045, 1.014340981, 0.965631141, 0.316793011, 1.02220535, 1.182063616, 1.41196421), Sample7 = c(1.335677026, 0.628621656, 0.411171453, 1.050563412, 1.290233552, 1.1603839, 0.445372411, 1.077192698, 0.726669337, 1.09453338), Sample8 = c(1.139360562, 0.404024829, 0.263714711, 0.899959209, 1.356913804, 1.246338203, 0.426568548, 1.104988267, 0.964924824, 1.083654341), Sample9 = c(1.38146599, 0.582817437, 0.783698738, 1.118948066, 1.010795866, 1.277086848, 0.434025911, 1.238871048, 1.201184368, 1.476478831), Sample10 = c(1.111486801, 0.60513273, 0.460680037, 1.385702246, 1.448873253, 1.364329784, 0.375032044, 1.382750002, 0.741842319, 1.035657705)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list( cols = list(Protein = structure(list(), class = c("collector_character", "collector")), Peptide = structure(list(), class = c("collector_character", "collector")), Sample1 = structure(list(), class = c("collector_double", "collector")), Sample2 = structure(list(), class = c("collector_double", "collector")), Sample3 = structure(list(), class = c("collector_double", "collector")), Sample4 = structure(list(), class = c("collector_double", "collector")), Sample5 = structure(list(), class = c("collector_double", "collector")), Sample6 = structure(list(), class = c("collector_double", "collector")), Sample7 = structure(list(), class = c("collector_double", "collector")), Sample8 = structure(list(), class = c("collector_double", "collector")), Sample9 = structure(list(), class = c("collector_double", "collector")), Sample10 = structure(list(), class = c("collector_double", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))

Expected kind of graph 776bE

Hence peptide number varies for each protein, how can I compare each peptide and save the faceted graph into single plots, by this, I can select only required graphs.

Q2. What is another possible way to present this correlation?

R genome gene • 2.0k views
ADD COMMENT
2
Entering edit mode
5.8 years ago
pbpanigrahi ▴ 420

I will try to answer for Q2

Simple way is to generate a correlation matrix protein wise

# Load libraries
library(dplyr);
library(ggplot2);
# Lets assume data is stored in data variable 
# Store correlation value
cormat = data.frame(protein = "A", corval = 0);  # Dummy row, later remove
for(x in unique(data$Protein))
{
    print(x);
    tempind = which(data$Protein==x);
    if(length(tempind) > 1)
    {
    tempval = cor(t(data[tempind,c(-1,-2)])) %>% .[upper.tri(.)] %>% unlist;
    tempval=cbind(protein = x, corval = tempval);
    cormat=rbind(cormat,tempval)
    }
}
cormat= cormat[-1,];
ggplot(cormat, aes(x=protein, y=corval, col=protein))+geom_point();

You need to beautify the ggplot.

What the code does Since number of peptides vary, the plot calculate pair wise correlation of all peptides and stores in a matrix. So 6 rows for A protein since 4 peptide and 6 unique pairing, B singe one peptide skip correlation, C 2 peptide so one correlation values and so on. Plot all of them on single plot.

Output https://ibb.co/mSErEo

trial

Hope there is alternative ways to do

Thanks

Priyabrata

ADD COMMENT

Login before adding your answer.

Traffic: 1680 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6