RNAseq and PAM50 prediction
1
0
Entering edit mode
4.0 years ago
graeme.thorn ▴ 100

I've a set of RNAseq data from breast cancer tissue samples (counts and post-cqn-normalised log(RPKM) values) and wish to use the PAM50 classifier to classify them.

I've seen the question genefu for PAM50 prediction and the question RNAseq data and PAM50 method, and neither are particularly helpful in terms of what I need to input into the R/genefu predictor (using intrinsic.cluster.predict) to get consistent PAM50 classification. I only have 138 samples, so I'm not going to be able to train the classifier before running it on the remaining samples.

Is there anywhere with a workflow from RNAseq counts to PAM50 types or can someone provide details as to how to go about this?

RNA-Seq R genefu • 1.3k views
ADD COMMENT
0
Entering edit mode
4.0 years ago

Hey Graeme,

I do not believe log(RPKMs) are ideal for this. If that is all that you have, then no problem, though.

I am convinced that a handful or even more of those PAM50 genes are not adding much information in terms of risk of metastasis in ER-positive, Her2-negative breast tumours. I neither believe there is any workflow for you to follow in relation to this, but you should have knowledge of regression and classification models. I gave a previous answer, here: How to exclude some of breast cancer subtypes just by looking at gene expression?

I would be interested in different approaches:

  • RandomForest®
  • Penalised regression (my previous answer)
  • Stepwise regression and / or just include all genes in the same regression model

To use any of these models to full effect, you would have to build it on known cases where metastasis occurred / did not occur, and then predict it on unknown cases.

Kevin

ADD COMMENT
0
Entering edit mode

Thanks Kevin, but this is work in collaboration with a commercial company who will be running PAM50 on the non-deduplicated data (the sequencing included UMIs, which we are taking into account, and they aren't), so I was looking for the most robust way of running PAM50 on the data so we can do a direct comparison.

ADD REPLY

Login before adding your answer.

Traffic: 2251 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6