Can I put FF and FFPE samples together for RNA-seq data analysis
1
0
Entering edit mode
5.6 years ago
qxiong1 • 0

I have RNA-seq data for both fresh-frozen (FF) and formalin-fixed paraffin-embedded (FFPE) samples from prostate cancer. I want to put these two types of data together for data analysis since the sample size will be very small if just using one type. Does anybody know if this is OK? I saw that several studies indicated the expression profiles for FF and FFPE samples are highly correlated.

RNA-Seq design covariate • 2.2k views
ADD COMMENT
0
Entering edit mode

vst() works well, Kevin, thank you very much for your help!

ADD REPLY
2
Entering edit mode
5.6 years ago
ATpoint 81k

I suggest you include the preservation technique as a covariate into your design matrix, and then perform some initial quality controls (PCA, Correlations) to explore the impact of it on the expression data.

ADD COMMENT
0
Entering edit mode

I agree with ATpoint. The FFPE samples will be degraded and this will likely introduce bias. You can control for this by following what ATpoint recommends.

Edit: if the FFPE tissue has degraded unequally, which FFPE tends to do, you may still have issues with correctly modelling this degradation as a covariate. Looking at PCA bi-plots, etc, as ATpoint recommends, will provide further information in this regard.

ADD REPLY
0
Entering edit mode

ATpoint and Kevin. Thanks a lot. I have added a covariate (Tissue) to the design matrix and seems it indeed accounted for the difference between FF and FFPE. However, another issue arises. I want to output the corrected counts using select <- counts(dds,normalized=TRUE), DESeq2 still gave me the original normalized matrix and it was not corrected for the tissue type. Does anybody know how to output the tissue-corrected normalized matrix?

ADD REPLY
1
Entering edit mode

Including Tissue in the design formula will just result in modified / 'adjusted' statistics when you perform the differential expression comparisons, i.e., it will 'absorb' the effect of Tissue when calculating P values. It does not directly modify the counts.

However, you can output transformed counts that are adjusted for the covariates in your design formula when you perform vst() or rld() by setting blind=FALSE

Other possibilities are discussed here: Batch effects : ComBat or removebatcheffects (limma package) ?

ATpoint may have other suggestions.

ADD REPLY
0
Entering edit mode

Many thanks for your help. One more question: do you know if these rlog or VST transformed counts are normalized counts adjusted for the covariates or non-normalized counts? My main concern is if these transformed counts can be directly used for statistical tests on gene expression difference between two phenotype groups? just like TPM/FPKM/RPKM values?

ADD REPLY
0
Entering edit mode

FPKM / RPKM values are not actually amenable to differential expression comparisons. There is no cross-sample normalisation performed when deriving these numerical units.

You can use the rlog or vst counts for downstream applications, including statistical tests, machine learning, et cetera. In this case, set blind=FALSE

ADD REPLY
0
Entering edit mode

Hi Kevin, could you please take a look at my code and see if it is correct for the covariate analysis in DESeq?

design <- formula(~ Tissue + Response)

dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design = design)

dds <- DESeq(dds)

res <- results(dds, contrast=c("Response","R","NR"))

"Tissue" and "Response" are two columns in the sample table. "Response" column has two types of values (R and NR) which correspond to the phenotype of interest, while "Tissue" is the covariate I added to the design matrix. Thanks in advance!

ADD REPLY
0
Entering edit mode

Hello. That looks good!

ADD REPLY

Login before adding your answer.

Traffic: 2029 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6