Question

Differential expression: replicates in one condition, no replicates in the other

2

Entering edit mode

6.7 years ago

IP ▴ 760

Hi Biostars:

I am facing a problem with differential expression analysis, were due to the intrinsic features of the samples we can't have replicates in one condition.

Study design: We have a patient with a very rare translocation , no other similar translocation has been described in the world and we expect that expression of the genes surrounding the translocation is altered. Hence, we have perform RNA-seq of the translocated patient and 4 controls.

How should I proceed?:

Before you kill me for asking "Can I do differential expression without replicates?", I known that EdgeR and DEseq2 provide ways to proceed without replicates , and that NOISeq could be used without replicates. However, in this case we have replicates for the controls were the biological variance of each gene could be estimated, but no for the the patient, as there is no other individual in the world. So, my question is: Is there any way of estimating the variance for the control group, and then compare to the expression in one single sample, the patient in this case? Or better, do a dispersion estimation for the translocation patient?

my options (From EdgeR docs):

Use the genes and transcripts that are far away or in other chromosomes than the chromosome with the translocation to estimate the dispersion of that sample
Use a dispersion value defined previously.

Have any of you faced a similar problem, and, furthermore, have anybody tested how do the two options above mentioned that EdgeR provide for working without replicates perform?

thanks for reading :)

RNA-Seq sequencing EdgeR • 5.8k views

ADD COMMENT • link updated 5.2 years ago by Gordon Smyth ★ 7.0k • written 6.7 years ago by IP ▴ 760

0

Entering edit mode

Coming back to this, I came across this package OUTRIDER which is able to find DEGs compared to controls in an n=1 situation. I haven't tried it myself, but might be worth looking at it:

Paper: https://www.sciencedirect.com/science/article/pii/S0002929718304014

ADD REPLY • link 5.2 years ago by unawaz ▴ 60

1

Entering edit mode

5.3 years ago

Kristoffer Vitting-Seerup ★ 4.0k

I am sorry to tell you but if you cannot get more patients with the translocation you cannot make any generalizations to other patients. As you have no idea about the variation in the patient with the translocation you cannot do trustworthy statistics for testing the generalization. That said you can still do the analysis as a case study which is what a lot of medical doctors do. Alternatively you can try to create the same translocation in cell line and make generalisations from that.

ADD COMMENT • link 5.3 years ago by Kristoffer Vitting-Seerup ★ 4.0k

0

Entering edit mode

5.3 years ago

unawaz ▴ 60

I've actually had a similar issue to yours and the way I resolved it was: downloading more controls from public databases. We were using LCLs, so we them from geuvadis.

I also did an outlier detection analysis in which I calculated Z-scores and looked for the genes in my patient that did not look like controls More info: https://bioinformatics.stackexchange.com/questions/2180/rnaseq-z-score-intensity-and-resources

ADD COMMENT • link 5.3 years ago by unawaz ▴ 60

score 6 · Accepted Answer · 2019-01-24

The short answer is that you just proceed as usual. limma, edgeR and DESeq2 have no trouble with this scenario, although the edgeR quasi-likelihood pipeline would be better than the other options. The packages simply estimate variability from groups where you do have replication (controls in your case), and apply the same dispersion estimates to all the samples in all the groups.

The edgeR and DESeq2 pipelines for no replicates are for when none of the groups have any replicates. You however do have replicate controls.

edgeR can be used right down to a two-group comparison with n=2 in one group and n=1 in the other. I'm not saying that such small sample sizes are desirable, but the package will do the best it can with what it gets and will present scientifically defensible results even in that extreme scenario. You can see an example of an n=2 vs n=1 analysis in the discussion to this paper (i.e., my reply to Conrad Burden's first report): https://f1000research.com/articles/5-1438

BTW, the same question has been asked several times on the Bioconductor Support forum, for example: https://support.bioconductor.org/p/63585/ or https://support.bioconductor.org/p/61904/