Hi
Im am attempting to understand which genes are contributing the most to PC2. As you can see from the PCA plot from the DESeq2 plotPCA() function below the triangle samples appear to be seperated on PC2. (They are all the same disease)
My main questions is what do I need to start working with PCATools pca() function?
Is using the rlog data from DESeq2 approriate as below?
dds.sm <- DESeqDataSet(gse, design = ~ batch + diagnosis)
dds.sm <- estimateSizeFactors(dds.sm)
rld.sm <- rlog(dds.sm, blind = FALSE)
rld.sm.output <- assay(rld.sm)
pca.project <- pca(rld.sm.output)
plotloadings(pca.project,
rangeRetain = 0.01,
labSize = 3.0,
shapeSizeRange = c(3, 3),
title = 'Loadings plot',
subtitle = 'PC1, PC2',
caption = 'Top 1% variables',
shape = 24,
col = c('limegreen', 'black', 'red3'),
drawConnectors = TRUE)
And how can I add the gene symbols?
Thanks for any help!
Nathan
Thank you, Kevin.
when I run:
I get
Error in
.rowNamesDF<-(x, value = value) : missing values in 'row.names' are not allowed
perhaps this is a case of miss mapping? Is there an easy way to remove these rows? if that would be the appropriate thing to do?
Best wishes,
Nathan
hmmm, if you print
mapping
to terminal, what does in actually show? how abouttable(is.na(mapping))
seems there is na values
...'not good'! Maybe do:
, and assign this to the rownames.
This means that, if NA, the original ENSG ID will be used; otherwise, the gene symbol will be used.
Thank you Kevin, this worked.
returns