Principal component analysis in R on expression data
1
0
Entering edit mode
8.9 years ago
Ron ★ 1.2k

I am doing PCA in R on a data frame(df_f),this is pasted below. Rows are samples. Columns are genes.

pc_gtex <- prcomp(df_f)

as.fumeric <- function(x,levels=unique(x)) as.numeric(factor(x,levels=levels))

cols=as.fumeric(gtex_pm$tissue

plot(pc_gtex$x[,1], pc_gtex$x[,2], col=cols, main = "PCA", xlab = "PC1", ylab = "PC2")
legend("topleft", col=1:17, legend = paste(unique(gtex_pm$tissue), 1:17), pch = 20, bty='n', cex=1.5)

head(gtex_pm)

     sample   tissue
   1 SRR1069514 Prostate
   2 SRR1071717  Bladder
   3 SRR1073069 Prostate
   4 SRR1074410 Prostate

Based on the above gtex_group object looks like the levels:

head(gtex_group)
[1] 1 2 1 1 1

The sample head of Main table for PCA is: The rownames are the Samples

             SRR1069514    0    0.0009995    5.773065971    1.644998088    0.142367241    0.176471143    0.195566784    0.0009995    0.025667747    3.380994674    1.762502288    0    0.077886539    0    0.002995509    0.01093994    2.110576771    1.38829236    2.26186726    0.431132855    3.108480433    3.96347629    0    0    0.41012092    3.48452699    1.68565794    0    1.425034189    1.87456758    2.590542128    0    0    0    1.941471742    0.961646434    0    1.17711535    0.058268908    0    0.260824618    3.08534443    1.10426296    0.242946179    0.0009995    0    0    0    0.0009995    1.560247668    1.517541898    0.016857117    0.767326579    0.0009995    3.0191069    0    2.607050533    1.446683661    2.288384744    2.62082062    0.19309663    0    0    0.234281296    0    1.415610416    2.328837464    0.008959741    0.911479175    0.375005901    0.660107327    3.184739763    1.16064768    0.001998003    0.138891999    2.219855445    3.1011278    1.81872592    2.98229236    2.4114395    3.24528404    0    1.54734972    0.406131553    0.029558802    0.003992021    0.693647056    2.07581    2.8357982    0.0009995    0.082501222    1.09661029    2.75829962    0.635518068    3.11484775    0.01291623    3.40837159    0    
              SRR1071717    0    0    0.0009995    4.99519673    1.626491667    0.100749903    0.327863862    0.09531018    0    0.056380333    3.328196489    1.541373182    0    0.091667189    0.044973366    0    0.033434776    1.953311265    1.56444055    1.79142608    0.993622075    3.206236281    3.82609468    0    0    2.565487674    3.2202349    1.1304339    0    1.092258815    1.80203978    2.645394351    0    0    0.0009995    1.681200279    2.047434746    0    0.948176921    0.006975614    0.014888613    0.298622013    2.49667052    1.01884732    0.38662202    0    0    0    0    0.0009995    0.941958479    1.752845376    0.017839918    0.216722984    0.051643233    3.0505518    0    2.034444176    0.988053098    2.235804059    1.89686995    0.090754363    0    0    0.198850859    0    1.585554972    2.274905524    0    0.04305949    0.056380333    0.044016885    0.771496147    1.195436473    0    0.368801124    1.974636427    2.7700856    2.00120969    2.88875935    2.2651947    2.66242502    0    0.429181635    0.04018179    0.034401427    0    0.242161557    1.9907469    2.1384177    0.0009995    0.008959741    0.99916021    2.3892214    0.086177696    3.16821391    0    3.2038434    0
             SRR1073069    2.19544522    1.32866525    0.0009995    4.50198508    1.159707388    0.141499562    0.265436464    0.026641931    2.3330173    0.028587457    3.140698044    1.537297235    0.012916225    0.023716527    0    0.002995509    0.049742092    2.071157322    1.02460688    2.11818137    0.359072069    2.419656765    3.5065479    0.137149838    2.121902193    0.305276381    2.95958683    1.49939981    3.14397985    1.001366904    1.450911    1.39475844    1.930071085    1.140074079    0.037295785    1.609437912    0.412109651    0.870456196    0.943516718    0.013902905    0    0.152721087    2.88836976    1.482967248    0.272314595    2.061532121    0.552159487    2.394890764    1.391033116    0.443402947    1.593714952    1.285921387    0.00796817    0.371563556    0.020782539    3.1946651    1.26327891    2.212003715    1.46672161    2.140183804    2.71997877    0.294161039    0.018821754    0.0009995    0.179818427    1.893714192    1.731478538    2.502255288    0.013902905    0.752830183    0.347129531    0.407463111    2.467082065    0.558472277    1.563812734    0.022739487    1.608837732    2.8176816    1.30670988    2.44495233    1.81107178    3.03254625    0.569283193    0.948176921    0.101653654    0.036331929    0    0.786182047    1.9867779    3.5039946    2.463427618    0.008959741    0.76360564    2.20640453    0.514618422    2.87964779    1.11021142    3.18750899    1.22436349
              SRR1074410    2.69022562    1.70055751    0.013902905    3.314622273    0.503196597    0.4940863    0.044016885    0.023716527    1.753884517    0.03246719    2.767324893    1.666385193    0.009950331    0.05259245    0    0    0.017839918    1.575260461    0.76779072    2.22202559    0.83377831    2.198113071    3.57953881    0.051643233    2.207284913    0.072320662    3.04414141    1.39177929    2.851746423    0.982452934    1.33210213    1.888583654    1.871340532    1.238664044    0.03246719    1.734659877    0.486737828    0.412109651    1.126551657    0.035367144    0    0.213497174    2.76032635    1.131402111    0.572108852    2.102425378    0.291175962    1.85159947    0.943516718    0.283674051    1.232560261    0.982078472    0    0.223943232    0.035367144    2.9064091    1.583299255    2.376671636    1.185095749    2.07681309    2.20794469    0.877549904    0.151002874    0    0.107059072    3.038312721    1.486365915    2.633829402    0    0.403463105    0.195566784    0.285930539    1.296643139    0.48796633    1.664115474    0.054488185    1.884034745    2.3757426    1.71036863    2.61732284    1.9348492    3.1138708    1.220239777    0.322807874    0.12398598    0.004987542    0.002995509    0.446607051    1.939317    3.8484227    2.78346684    0.025667747    0.78253074    2.03352848    0.181487876    2.7091163    1.00430161    3.1429015    1.24875495

Once I have the plot with 17 levels,the legend created displays 17 levels,but the colors for them repeat after 1 to 8.So the 9th label has the same color as the first. Also,Is there any better way to add the group labels on the PCA plot.I have 17 unique groups.Either 2 groups are being assigned the same color because of "cols" variable or because of plotting "legend".The levels in cols variable are 17

I am just following this post from a genomics class.

http://genomicsclass.github.io/book/pages/pca_svd.html

PCA R • 3.5k views
ADD COMMENT
0
Entering edit mode

Your question isn't clear, but it seems to be about how to make a scatterplot with a lot of colors? The PCA itself went fine?

ADD REPLY
0
Entering edit mode

Yes the PCA is fine. I get the labels too which are 17. But 2 labels have the same color and are hard to differentiate. So 8 colors are there and then they repeat themselves. I tried the following code too, but I only get 6/7 colors.

plot(pc_gtex$x[,1:2], pch=16, col=as.numeric(gtex_pm_tissue[,"tissue"]), main="Color annotation") # 2 first principal components
legend("topleft", pch=16, col=unique(as.numeric(gtex_pm_tissue[,"tissue"])), legend=unique(gtex_pm_tissue[,"tissue"]))

Or if the colors can not be increased, can I just add the unique group labels on the plot itself?

image: plot

The group labels are the same as in the legend command

ADD REPLY
0
Entering edit mode

use letters instead of colors to distinguish the groups.

ADD REPLY
0
Entering edit mode

I don't have time to write you a full example, but you can use letters instead of colors, as in this post: http://is-r.tumblr.com/post/35050025650/plotting-letters-as-shapes-in-ggplot2

ADD REPLY
0
Entering edit mode

If you are trying to distinguish prostate from bladder, use two colors. Or different shades of one color for bladder, and different shades of a second color for prostate. Then you can more easily visualize spatial differences between bladder and prostate tissues, as well as relative "inter-tissue" differences for a tissue type within a PCA blob or cluster.

ADD REPLY
6
Entering edit mode
8.9 years ago
beegrackle ▴ 90

Your problem might be that the R default palette only has 8 colors (see here), and so R is just recycling through those colors. You could try manually entering in the colors = col=c("red", "green", etc up to 17)[cols] , or use RColorBrewer (the link has some examples of that) to generate a palette with 17 different colors. Though a PCA with 17 different colors is going to look like a hot mess.

ADD COMMENT
0
Entering edit mode

This works Great!

ADD REPLY
0
Entering edit mode

I wanted to plot different tissue types from GTEX data as well as some of the tissue types that I had from the experiments at our workplace and wanted to see how they cluster up in PCA.Also,I wanted to see how one tissue is closer to the other.Since the GTEX paper has the PCA image,but I wanted to plot our samples along with them.So I took 17 tissues from GTEX.

ADD REPLY

Login before adding your answer.

Traffic: 3001 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6