Question: PCA plot from read count matrix from RNA-Seq
4
Entering edit mode
3 months ago
rachel.kubik12 • 80

Hello!

I have RNA-Seq data with 32 samples. I used Salmon to generate a read count matrix. I want to generate a PCA plot to look at the relationship between my samples. How do I use the read count matrix to perform the PCA analysis?

Thanks!

ADD COMMENTlink 22 months ago rachel.kubik12 • 80 • updated 3 months ago mayankkaashyap • 0
Entering edit mode
0

Hi Kevin, Thanks! Can you please tell me how can I color two groups, one in green and other in red?

ADD REPLYlink 3 months ago
mayankkaashyap
• 0
Entering edit mode
0

Hey, this is now a Bioconductor package. So, you should be able to colour the groups easily by using that: https://github.com/kevinblighe/PCAtools ?

ADD REPLYlink 3 months ago
Kevin Blighe
43k
42
Entering edit mode
6 months ago
Kevin Blighe 43k
Republic of Ireland

NB - this is now a Bioconductor R package: https://github.com/kevinblighe/PCAtools

-------------------------

You should normalise your data prior to performing PCA. In the code below, you'll have to add plot legends yourself, and also colour vectors (passed to the 'col' parameter).

Then, assuming that you have transcripts as rows and samples as columns:

NB - in this code, the plots I've shown don't necessarily match the exact code, but the plot type is the same

[Edit: also take a look at my definition of PCA here: PCA in a RNA seq analysis]

Perform PCA / single value decomposition

project.pca <- prcomp(t(MyReadCountMatrix))
summary(project.pca)

#Determine the proportion of variance of each component
#Proportion of variance equals (PC stdev^2) / (sum all PCs stdev^2)
project.pca.proportionvariances <- ((project.pca$sdev^2) / (sum(project.pca$sdev^2)))*100

Scree plot

barplot(project.pca.proportionvariances, cex.names=1, xlab=paste("Principal component (PC), 1-", length(project.pca$sdev)), ylab="Proportion of variation (%)", main="Scree plot", ylim=c(0,100))

scree

Pairs plots

par(cex=1.0, cex.axis=0.8, cex.main=0.8)
pairs(project.pca$x[,1:5], col="black", main="Principal components analysis bi-plot\nPCs 1-5", pch=16)
pairs(project.pca$x[,6:10], col="black", main="Principal components analysis bi-plot\nPCs 6-10", pch=16)

pairs

Bi-plots

par(mar=c(4,4,4,4), mfrow=c(1,3), cex=1.0, cex.main=0.8, cex.axis=0.8)

#Plots scatter plot for PC 1 and 2
plot(project.pca$x, type="n", main="Principal components analysis bi-plot", xlab=paste("PC1, ", round(project.pca.proportionvariances[1], 2), "%"), ylab=paste("PC2, ", round(project.pca.proportionvariances[2], 2), "%"))
points(project.pca$x, col="black", pch=16, cex=1)

#Plots scatter plot for PC 1 and 3
plot(project.pca$x[,1], project.pca$x[,3], type="n", main="Principal components analysis bi-plot", xlab=paste("PC1, ", round(project.pca.proportionvariances[1], 2), "%"), ylab=paste("PC3, ", round(project.pca.proportionvariances[3], 2), "%"))
points(project.pca$x[,1], project.pca$x[,3], col="black", pch=16, cex=1)

#Plots scatter plot for PC 2 and 3
plot(project.pca$x[,2], project.pca$x[,3], type="n", main="Principal components analysis bi-plot", xlab=paste("PC2, ", round(project.pca.proportionvariances[2], 2), "%"), ylab=paste("PC3, ", round(project.pca.proportionvariances[3], 2), "%"))
points(project.pca$x[,2], project.pca$x[,3], col="black", pch=16, cex=1)

biplots

Tri-plot

require(scatterplot3d)
par(mar=c(4,4,4,4), cex=1.0, cex.main=0.8, cex.axis=0.8)

scatterplot3d(project.pca$x[,1:3], angle=-40, main="", color="black", pch=17, xlab=paste("PC1, ", round(project.pca.proportionvariances[1], 2), "%"), ylab=paste("PC2, ", round(project.pca.proportionvariances[2], 2), "%"), zlab=paste("PC3, ", round(project.pca.proportionvariances[3], 2), "%"), grid=FALSE, box=FALSE)
source('http://www.sthda.com/sthda/RDoc/functions/addgrids3d.r')
addgrids3d(project.pca$x[,1:3], grid = c("xy", "xz", "yz"))

source('http://www.sthda.com/sthda/RDoc/functions/addgrids3d.r')
addgrids3d(project.pca$x[,1:3], grid = c("xy", "xz", "yz"))

1kg

ADD COMMENTlink 6 months ago Kevin Blighe 43k
Entering edit mode
1

The third one is very fancy :)

ADD REPLYlink 22 months ago
ATpoint
17k
Entering edit mode
0

Hi Kevin,

I also want to perform a PCA plot. I have a read-count matrix data. You mentioned that we need to normalize data prior to PCA plot. Could you please help me how to normalize the data?

ADD REPLYlink 22 months ago
Mehmet
• 460
Entering edit mode
0

Hi Mehmet, from where did you obtain the counts or how did you produce them?

ADD REPLYlink 22 months ago
Kevin Blighe
43k
Entering edit mode
0

Hi Kevin,

I guess we obtain reads counts from HtSeq, and performed DEG analyses using edgeR. I have 23 samples. Previously, I generated a PCA plot using read.count.matrix data following your codes above. I also would like to ask you; what is difference between generating a PCA plot from PC1 to PC5, and PC1 to PC10?

ADD REPLYlink 22 months ago
Mehmet
• 460
Entering edit mode
1

Hi Mehmet, I have not used EdgeR (I prefer DESeq2), but PCA should be performed on normally-distributed counts, so, your normalised logged counts.

There is no difference between the generation of the pairs plots for PC1-5 and PC6-10. It is just a matter of fitting all of the plots on the same page. Technically, you can generate a pairs plot for PC1-PC100, but you would require a very large plotting window.

ADD REPLYlink 22 months ago
Kevin Blighe
43k
Entering edit mode
0

Hi Kevin, Thank you very much.

Should I do anything in my FPKM data prior to PCA plot?

I have run analyses to generate PCA plot (pairs plots). I would like to ask you a few things: 1. What are the dots in boxes? 2.What are the numbers around each boxes?

I have 17 samples in my data, but I have 15 dots in boxes in the plot.

could you please help me about how to interpret results ?

Thank you

ADD REPLYlink 22 months ago
Mehmet
• 460
Entering edit mode
0

If you have used HTSeq followed by EdgeR, then you should have logged counts(?). How does the distribution appear when you run: hist(MyData) ?

Can you post an image of the pairs plot and provide the code that you are using to execute the PCA?

You can upload figures/images here, and then copy/psate the URL here in your response.

Kevin

ADD REPLYlink 22 months ago
Kevin Blighe
43k
Entering edit mode
0

Hi Kevin,

I have uploaded two figures.

my commands:

project.pca <- prcomp(t(MyReadCountMatrix))
project.pca.proportionvariances <- ((project.pca$sdev^2) / (sum(project.pca$sdev^2)))*100

barplot(project.pca.proportionvariances, cex.names=1, xlab=paste("Principal component (PC), 1-", length(project.pca$sdev)), ylab="Proportion of variation (%)", main="Scree plot", ylim=c(0,100))

par(cex=1.0, cex.axis=0.8, cex.main=0.8)    
pairs(project.pca$x[,1:5], col="black", main="Principal components analysis bi-plot\nPCs 1-5", pch=16)
ADD REPLYlink 22 months ago
Mehmet
• 460 • updated 13 months ago
RamRS
21k
Entering edit mode
1

You need to post the URL's for the figures in your post.

ADD REPLYlink 22 months ago
genomax
68k
Entering edit mode
0

Sorry,

Please have a look

plot1


plot2

ADD REPLYlink 22 months ago
Mehmet
• 460 • updated 13 months ago
RamRS
21k
Entering edit mode
0

Looks okay to me. In the second one, the title is incorrect though, as it's PC1-10. You could also make the dots smaller here, but it's not that important as these plots are mainly just for QC.

What's the variance explained on PC1 and 2?

Looks fine.

Edit: as you go further down along the PCs, you always find 'odd' plots like the one for PC10. Nothing to worry about, though. I'd worry if I saw that on PC1-3

ADD REPLYlink 22 months ago
Kevin Blighe
43k
Entering edit mode
0

Thank you for help.

I do not know variance between PC1 and PC2. When I run commands, I did not get variance value. I have 17 PCs when I type below:

    Standard deviations:
 [1] 3.49034964 1.22304723 1.05526472 0.99765777 0.76043084 0.56623044 0.37994839
 [8] 0.24585524 0.23052947 0.17215343 0.08918422 0.07063718 0.06521970 0.05890600
[15] 0.04826593 0.04446027 0.03575958

How to interpret the graphs? and what is the meaning of the dots in the boxes? As I have 17 PCs, should I show 17 PCs in the plot?

In some tutorials I saw that people write column names instead of PC1, PC2 .....

Should I use column names also?

There are several type of PCA plots. Which should I use in my paper?

Like this one:

https://www.bioconductor.org/help/workflows/rnaseqGene/#pca-plot

Sorry for asking too many questions.

Thank you

ADD REPLYlink 22 months ago
Mehmet
• 460
Entering edit mode
1

The PCA bi-plot at https://www.bioconductor.org/help/workflows/rnaseqGene/#pca-plot is generated using the plotPCA function in DESeq2. It is usually used when processing RNA-seq data. However, this function is biased because it removes a large proportion of genes of low variance from the dataset prior to performing PCA, which defeats the purpose of conducting the PCA in the first place. My methods perform a completely unbiased PCA.

  • If you follow my code, the percentage of overall variance explained by each principal component will be held in project.pca.proportionvariances
  • Each dot is a sample.
  • You do not have to plot all PCs from 1-17. Most people only plot PC1 versus PC2. Look at my code at the top of this page under the heading 'Bi-plots'
  • PCA is primarily used for QC purposes to show that there are no outliers in the dataset. Looking at your PCA plots, there are no outliers. That is all you need to say. It is possible to go a lot more advanced into PCA, but, I would not recommend it unless you fully understand the PCA process
ADD REPLYlink 22 months ago
Kevin Blighe
43k
Entering edit mode
0

Hi Kevin sorry for the random question, but would a PCA plot be useful to see the expression of different genotypes to a treatment. I have 5 genotypes and would like to see how the expression of each pair interact in a plot. Thanks

ADD REPLYlink 19 months ago
catagui
• 30
Entering edit mode
0

Hello, yes, it would be interesting to see that. You have RNA-seq data for these samples?

ADD REPLYlink 19 months ago
Kevin Blighe
43k
Entering edit mode
0

Yes I do have 5 genotypes (3 samples per geno) and a control and treatment.

ADD REPLYlink 19 months ago
catagui
• 30
Entering edit mode
0

Just a quick additional question, I have normalized my counts using DESeq, and then log transformed (log2) my data, but do you do something about the zero's in the datasets when doing the log transformation? Since they will give a infinite negative value. Do you add something to those zero's? Also, do you do any removal of lowly expressed genes, nor not for this kind of analysis? Thanks!!

ADD REPLYlink 12 months ago
m.laarman
• 10
Entering edit mode
0

Oh, to manage those, you should use the regularised log transformation, the function for which is supplied with DESeq2: https://rdrr.io/bioc/DESeq2/man/rlog.html

To then access the r-logged counts, you use assay():

x <- rlog(dds)
assay(x)

PCA is then performed on these r-logged counts.

ADD REPLYlink 12 months ago
Kevin Blighe
43k
Entering edit mode
0

Hi Kevin, I'm sorry to be asking more questions, I'm totally new to this. I'm trying to use the code you sent me, but it's giving me errors:

"rlog() may take a long time with 50 or more samples, vst() is a much faster transformation Error in DESeqDataSet(se, design = design, ignoreRank) : some values in assay are not integers"

I'm wondering if this is based on the way I normalized; I used a script from a colleague from DESeq (not DESeq2), using the following code:

conds <- factor(c(colnames(counts)))
cds <- newCountDataSet(counts, conds)
cds <- estimateSizeFactors(cds)

normalized_counts <- counts(cds, normalized=TRUE)

Do you have an example of code that could get me from raw counts to logged normalized counts to use in PCA?

If you need more info on my samples please let me know. Thanks!

ADD REPLYlink 12 months ago
m.laarman
• 10
Entering edit mode
0

Oh, that error is common. When the dataset is large, it is indeed better to run vst(), and then you access these again via the assay() function. These are perfectly fine in place of logged counts.

If you're starting from your own data matrix of raw counts (cts), then you just need:

library("DESeq2")
dds <- DESeqDataSetFromMatrix(countData = cts,
                              colData = coldata,
                              design = ~ condition)
dds <- DESeq(dds)

normalized_counts <- counts(dds, normalized=TRUE)

Are you actually testing anything as in differential expression? Here, condition would be, like, the CaseControl variable.

Then, for vst():

vsd <- vst(dds, blind=TRUE)
assay(vsd)

Note that DESeq2 already has its own PCA implementation: https://bioconductor.org/packages/3.7/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#principal-component-plot-of-the-samples

If you have issues running DESeq2 itself, you may consider opening a new question

ADD REPLYlink 12 months ago
Kevin Blighe
43k
Entering edit mode
0

Thanks! I'm going to try around with DESeq2 and if I need more help I will open a new question.

Just one more question for now: do you know the MDS plot? I've used it during my analysis in edgeR, and do you know how this compares to a PCA plot? I've understood that also here the distance between the dots in the plot is a measure of distance between the datasets (Euclidean distance) (explanation in R: This function is a variation on the usual multdimensional scaling (or principle coordinate) plot, in that a distance measure particularly appropriate for the microarray context is used. The distance between each pair of samples (columns) is the root-mean-square deviation (Euclidean distance) for the top top genes. Distances on the plot can be interpreted as leading log2-fold-change, meaning the typical (root-mean-square) log2-fold-change between the samples for the genes that distinguish those samples.)

Would you know how this would compare to PCA?

ADD REPLYlink 12 months ago
m.laarman
• 10
Entering edit mode
0

Sorry for another reply. I've run the code you gave me and it worked, but I still have the same 'problem' now, in that there is no normal distribution in my data, since the large number of genes that had 0 counts before, now just shifted to around 4 (after normalization and log transformation). Therefore, a histogram of all genes of one of my samples looks like this:

histogram distribution of normalized logged counts from sample 1

I think the data except for the large bar of unexpressed genes might have a distribution more similar to a normal distribution, but how to deal with this?

ADD REPLYlink 12 months ago
m.laarman
• 10
Entering edit mode
0

Hey, that does not look like logged or variance-stabilised data. Can you show the code that you used? Regarding MDS and PCA, the mathematical formula behind each is different, but they both ultimately show relationships between samples / variables. Note that MDS can be performed using the PCA transformation.

You may consider a new question for that, as my time is extremely limited today.

ADD REPLYlink 12 months ago
Kevin Blighe
43k
Entering edit mode
0

Hi Kevin,

Thanks! That doesn't sound good.. I've started a new post here: How to normalize count data for PCA in R - something goes wrong in case you would like to comment which I would greatly appreciate. I've posted the code there as well.

Thanks for your help so far!

ADD REPLYlink 12 months ago
m.laarman
• 10
Entering edit mode
0

I want to explore RNAseq data from Expression Atlas (EMBL-EBI database) to get an impression on the similarity/differences of RNA-sequencing samples, i.e. to identify subgroups or outliers. The dataset downloaded from the database was in TPM, so I changed NAs into '0's and multiplied the values by 1E+6 (to get integer values). I want to log transform the data (rlog) but got stuck at:

log.data4pca = rlog(data4pca)

Which results an error: Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘sizeFactors’ for signature ‘"data.frame"’

Can anyone help me with that?

Data:

Gene ID sample1 sample2 sample3 sample4 sample5
ENSG00000000003 0.6 65.0    30.0    61.0    62.0
ENSG00000000005     1.0 0.4 0.8 
ENSG00000000419 87.0    83.0    66.0    98.0    46.0
ENSG00000000457 3.0 11.0    9.0 10.0    5.0
ENSG00000000460 6.0 3.0 3.0 3.0 2.0
ENSG00000000938 308.0   6.0 6.0 12.0    6.0
ENSG00000000971 1.0 61.0    31.0    87.0    738.0
ENSG00000001036 38.0    85.0    74.0    28.0    37.0
ENSG00000001084 25.0    24.0    36.0    41.0    78.0
ENSG00000001167 45.0    25.0    19.0    24.0    13.0
ENSG00000001460 1.0 6.0 3.0 3.0 0.7
ENSG00000001461 5.0 20.0    13.0    17.0    1.0
ENSG00000001497 28.0    15.0    12.0    20.0    8.0
ENSG00000001561 71.0    34.0    27.0    16.0    11.0
ENSG00000001617 2.0 10.0    16.0    80.0    2.0
ENSG00000001626     86.0    62.0    1.0 1.0
ENSG00000001629 12.0    31.0    25.0    32.0    11.0
ENSG00000001630 2.0 4.0 9.0 5.0 4.0
ENSG00000001631 9.0 18.0    19.0    21.0    11.0
ENSG00000002016 4.0 9.0 10.0    6.0 2.0
ENSG00000002079                 
ENSG00000002330 18.0    30.0    20.0    10.0    9.0
ENSG00000002549 55.0    80.0    90.0    66.0    114.0
ENSG00000002586 46.0    117.0   99.0    160.0   69.0
ENSG00000002587 1.0 3.0 2.0 7.0 0.3
ENSG00000002726 1.0 257.0   248.0   1.0 0.1
ENSG00000002745 0.1 0.2 0.2 0.5 
ENSG00000002746     0.1 0.1 0.2 
ENSG00000002822 9.0 4.0 4.0 4.0 3.0
ENSG00000002834 123.0   121.0   148.0   51.0    45.0

Script:

library(PCAtools)
library(DESeq2)

#copy data form Excel
data4pca = read.table(pipe("pbpaste"), sep="\t", header=T)
#extract rownames from first column
rownames(data4pca) = data4pca[,1]
#remove first column with geneIDs
data4pca = data4pca[,-1]
# change NAs into '0's
data4pca[is.na(data4pca)] <- 0
#to get integer values (TPM)
data4pca = data4pca * 1000000
# rlog transform data
log.data4pca = rlog(data4pca)
#principal components analysis (PCA) on log-transformed data
pcadata <- pca(log.data4pca)
ADD REPLYlink 3 months ago
mschmidt
• 0
Entering edit mode
0

You are using my package, PCAtools? - DESeq2 has its own PCA function, plotPCA().

To use the DESeq2 implementation:

DESeq2::plotPCA(log.data4pca)

To use my PCAtools implementation:

pca(assay(log.data4pca))
ADD REPLYlink 3 months ago
Kevin Blighe
43k
Entering edit mode
0

Hi Kevin, I am using your package (PCAtools). My problem is that I can't get the data rlog-transformed. Simple log transformation works, but rlog not.

ADD REPLYlink 3 months ago
mschmidt
• 0
Entering edit mode
0

Oh, I see, have you tried rlog(data.matrix(data4pca)) ?

ADD REPLYlink 3 months ago
Kevin Blighe
43k
Entering edit mode
0

Your package (PCAtools) has no rlog

> log.data4pca = rlog(data.matrix(data4pca))
Error in rlog(data.matrix(data4pca)) : could not find function "rlog"

with loaded DESeq2

> log.data4pca = rlog(data.matrix(data4pca))
converting counts to integer mode
Error in validObject(.Object) : 
  invalid class “DESeqDataSet” object: NA values are not allowed in the count matrix
In addition: Warning message:
In mde(x) : NAs introduced by coercion to integer range

its strange as I have all NAs converted to '0's, even

> log.data4pca = rlog(data.matrix(data4pca + 1))

returns the same error

ADD REPLYlink 3 months ago
mschmidt
• 0
Entering edit mode
0

Yes, rlog is part of DESeq2. You should check the encoding of your object via str(). rlog should be able to accept a data-matrix, which contains only numerical values.

ADD REPLYlink 3 months ago
Kevin Blighe
43k
Entering edit mode
0
> str(data4pca)
'data.frame':   45946 obs. of  13 variables:
 $ bone.marrow           : num  6.00e+05 0.00 8.70e+07 3.00e+06 6.00e+06 3.08e+08 1.00e+06 3.80e+07 2.50e+07 4.50e+07 ...
 $ colon                 : num  6.5e+07 1.0e+06 8.3e+07 1.1e+07 3.0e+06 6.0e+06 6.1e+07 8.5e+07 2.4e+07 2.5e+07 ...
 $ duodenum              : num  3.0e+07 4.0e+05 6.6e+07 9.0e+06 3.0e+06 6.0e+06 3.1e+07 7.4e+07 3.6e+07 1.9e+07 ...
 $ esophagus             : num  6.1e+07 8.0e+05 9.8e+07 1.0e+07 3.0e+06 1.2e+07 8.7e+07 2.8e+07 4.1e+07 2.4e+07 ...
 $ liver                 : num  6.20e+07 0.00 4.60e+07 5.00e+06 2.00e+06 6.00e+06 7.38e+08 3.70e+07 7.80e+07 1.30e+07 ...
 $ lymph.node            : num  6.00e+06 1.00e+05 1.32e+08 1.70e+07 7.00e+06 6.30e+07 3.40e+07 3.60e+07 2.60e+07 5.80e+07 ...
 $ rectum                : num  6.70e+07 1.00e+06 8.30e+07 1.30e+07 5.00e+06 7.00e+06 9.80e+07 1.13e+08 2.70e+07 3.30e+07 ...
 $ saliva.secreting.gland: num  5.3e+07 4.0e+05 3.5e+07 6.0e+06 1.0e+06 3.0e+06 1.8e+07 1.7e+07 7.0e+06 1.0e+07 ...
 $ small.intestine       : num  2.2e+07 4.0e+05 7.8e+07 9.0e+06 3.0e+06 1.0e+07 5.3e+07 7.6e+07 2.9e+07 2.3e+07 ...
 $ spleen                : num  1.7e+07 5.0e+05 9.7e+07 1.4e+07 5.0e+06 1.8e+08 1.8e+07 6.3e+07 5.0e+07 3.9e+07 ...
 $ stomach               : num  2.2e+07 0.0 6.3e+07 1.0e+07 2.0e+06 6.0e+06 6.0e+07 7.1e+07 2.7e+07 2.5e+07 ...
 $ tonsil                : num  1.40e+07 3.00e+05 1.14e+08 1.40e+07 1.00e+07 3.60e+07 2.10e+07 4.10e+07 2.30e+07 4.90e+07 ...
 $ vermiform.appendix    : num  1.10e+07 2.00e+06 1.00e+08 1.30e+07 8.00e+06 1.72e+08 6.00e+07 7.00e+07 2.10e+07 4.20e+07 ...
ADD REPLYlink 3 months ago
mschmidt
• 0
Entering edit mode
0

Just looks like there is still something wrong with your formatting. Look at this example:

Generate random data and impute a NA

mat <- round(matrix(rexp(200, rate=.1), ncol=20), 0)
mat[7,9] <- NA

Try with NA

require(DESeq2)
rlog(mat)
Error in DESeqDataSet(se, design = design, ignoreRank) : 
  NA values are not allowed in the count matrix

Convert any NA to 0 and re-try

mat[is.na(mat)] <- 0
rlog(mat)
converting counts to integer mode
              1        2         3         4        5         6         7         8         9        10        11        12       13         14       15        16        17        18       19       20
 [1,] 1.6303296 2.101263 1.6303296 4.4426980 3.916323 2.5563570 1.6870700 4.2057665 1.4127739 2.7271864 2.5850314 2.5552601 2.286887  2.5540934 2.557778 2.5569548 3.2821198 2.9502635 3.622561 1.703656
 [2,] 2.9635154 2.963067 2.9635154 2.4257258 2.943951 3.9320118 4.1476477 2.0566027 2.9753406 2.0970838 2.3144479 3.6813475 2.965651  3.3690668 4.196746 3.4155491 2.9628123 2.0111253 2.481567 2.962925
 [3,] 2.2500342 3.188301 3.6284485 2.7554526 3.193896 0.4068467 4.4410790 0.6108451 3.7866725 0.9955659 2.8728750 1.1451716 3.620244  1.1661840 1.239904 3.7693159 1.7485206 0.5350876 2.592244 2.284964
 [4,] 4.4256292 3.108021 0.8907872 1.8005166 3.825927 2.8045730 4.6617094 0.7861423 3.1211707 1.7357820 2.7223201 3.1454056 4.921423  1.7119954 1.809768 3.3314205 1.5885089 2.6071480 2.105057 2.414051
 [5,] 4.2780751 4.160892 5.4845108 3.3502959 3.350899 3.2308789 3.3498840 3.3493575 3.3491791 3.3481371 3.3482641 1.8285660 4.849273  1.4041449 2.977995 3.2891094 2.7175965 3.3482017 3.349179 3.489831
 [6,] 1.3250958 2.979303 3.1284451 4.5200420 3.937701 3.9664888 4.0853001 4.0207723 3.4823841 3.1203092 1.3941709 3.5961874 4.379111  3.1715398 3.914777 4.2228263 3.4001319 4.0778840 5.350684 1.630279
 [7,] 1.4466070 1.224425 3.6512460 3.5304046 2.286591 1.9162281 1.5021949 3.6600331 0.5854978 2.3790455 0.6514361 2.5760491 2.643559  1.6948982 2.201923 2.0457115 2.8874726 1.1159874 3.051115 3.568114
 [8,] 1.9664247 4.327562 0.7690776 3.9564714 4.720774 2.1731031 3.6682082 1.3603538 3.2485642 2.5102086 2.0997103 2.6595941 3.671308  1.7433200 2.481331 0.5302784 0.4837551 2.9761034 2.263407 1.177239
 [9,] 2.6268004 3.554450 3.7418866 0.9981525 1.113274 3.4400201 3.6312819 1.8713298 1.3540877 2.4580801 0.2464204 3.5573662 5.202193  0.8450194 2.100423 3.6015767 2.9490219 1.9511093 3.711696 3.622012
[10,] 0.5877168 2.518681 2.1797479 1.5528766 3.777592 2.8244428 0.6276905 3.2713734 2.6839627 0.9035905 1.6319140 0.6812501 4.749080 -0.1144293 2.163285 3.9095113 1.6715744 1.2102973 1.896780 3.886975
attr(,"betaPriorVar")
[1] 4.796553
attr(,"intercept")
          [,1]
 [1,] 2.648234
 [2,] 2.991484
 [3,] 2.311582
 [4,] 2.675867
 [5,] 3.392713
 [6,] 3.485171
 [7,] 2.230926
 [8,] 2.439339
 [9,] 2.628809
[10,] 2.130695

To check if there is a NA anywhere in your data, do:

anyis.na(mat) == TRUE)

It returns TRUE if there is a NA present.

ADD REPLYlink 3 months ago
Kevin Blighe
43k
Entering edit mode
0

Hi Kevin, I am a new one. Would you please explain how to set legends with color in biplot? My data: each column is a sample, each row is a gene. Thank you very much.

ADD REPLYlink 13 months ago
linjc.xmu
• 10
Entering edit mode
0

Hello, you can add a custom legend to any plo using something like:

legend("bottomright", bty="n", cex=0.8, title="MyFirstLegend", c("Chinese", "Iraqis", "Iranians", "Syrians", "Russians"), fill=c("red1", "black", "forestgreen", "red2", "darkred"))

The first parameter can be any of "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right", or "center"

Kevin

ADD REPLYlink 13 months ago
Kevin Blighe
43k
Entering edit mode
1

Thank you very much!

ADD REPLYlink 13 months ago
linjc.xmu
• 10
Entering edit mode
0

How can I put the names of the samples in this case? And different colors for it group? Because in the pipeline there are just the basic information to plot a simple graph.

Tks

ADD REPLYlink 13 months ago
silas008
• 100

Login before adding your answer.

Powered by the version 1.5