DESeq2: can I correct for relatedness when using data from multiplex families?
1
0
Entering edit mode
9.2 years ago
tiphaine ▴ 10

Hi All.

My question is the same blood than this post DESeq2: can I correct for relatedness when using data from multiplex families?

In your solution, the model takes care only if the samples belong to the same family. I would like to know how we can add the level of relatedness such as MZ twin (100%) and DZ twin (50%). Currently, I have a variable zygosity with 2 levels (MZ and DZ).

Do you thus suggest to use the following model?

​~  Covariate + Technical.Confounder + familyID + zygosity + condition

and how we do the same thing if we are also also other members of family.

Regards,
Tiphaine

relatedness DESeq2 RNA-Seq • 2.5k views
ADD COMMENT
0
Entering edit mode

Do you have multiple individuals per family? That is, do you only have the twins or do you have other individuals as well?

ADD REPLY
0
Entering edit mode

Currently, I have two cases.

One model where I have only twin pairs (DZ and MZ) and another model where I have a mixture of twin pairs and singles but I know whether they are MZ or DZ.

I have not yet other individuals in my family. It is more for my curiosity to know how to deal it too.

ADD REPLY
0
Entering edit mode
9.2 years ago

The only real way to do this in a generalized linear model is to add a column to the model matrix denoting the twin type. In cases where you have only a single set of twins per family, then there's no gain from doing this (in fact, the model matrix will be rank deficient). This would only work for cases where you have more individuals per family than simply the twins. If this approach isn't acceptable, then the only alternative is to use a different kind of model, where you can input a sample correlation or relatedness structure of some sort. Limma provides some methods in that regard, though I don't know enough about them to say how much more useful they may be in this case.

ADD COMMENT
0
Entering edit mode

Thanks for this

Currently, I uses lmer4 with a random effect for zygosity and family. So I can use DESeq to normalise my data but not to find the differential expression.

ADD REPLY
0
Entering edit mode

Keep in mind that unless you have a large number of samples that you'll have lower power with lmer4.

ADD REPLY
0
Entering edit mode

Oh, I didn't know that. I am going to look at that and maybe I come back to you if I am not sure to do.

Do you have a link that explains that.

Thanks

ADD REPLY
0
Entering edit mode

The paper on limma is probably the most appropriate place to start, since everything else (DESeq2, edgeR, etc.) really follows from it. Yes, that describes microarrays, but the underlying statistical argument applies. In short, lmer4 treats each gene individually, whereas DESeq2/edgeR/limma/etc. share information across genes to better estimate things like variance. They also incorporate priors for variance and fold-change shrinkage, which has some of the same effect as using a mixed model.

ADD REPLY
0
Entering edit mode

Ok, thanks. I am going to reads the papers to be sure about my models because I thought to use it the same protocol for my different omic data that are generated into tables of per-feature counts for each sample such as microbiome .

ADD REPLY
0
Entering edit mode

I have a question about the filter/normalisation step: do you use DESeq (first version) to perform it? For instance, as explained in this vignette: http://www.bioconductor.org/packages/release/bioc/vignettes/genefilter/inst/doc/independent_filtering.pdf?

ADD REPLY
0
Entering edit mode

You could do it manually, but it's simpler to just use DESeq2, which will do that for you.

ADD REPLY
0
Entering edit mode

Thank you for your answer, but I believe that my question was not well-posed --sorry about that.

As Tiphaine, I am interested in using GLMM to find differentially expressed genes, but I think that to have meaningful results I need to perform a filtering/normalisation step beforehand. A better question would have been: "How to perform these steps on raw counts data before applying a GLMM?".

To the best of my knowledge, DESeq2 performs the filtering step by means of the results() function. It selects genes which optimise the number of adjusted p-values less than a given value -- and sets the p-values for the genes which do not pass the filter to NA. How these outcomes can be used in my context?

Btw, do you think it is better to start a new thread? IMHO, my intervention is making it messy...

ADD REPLY
0
Entering edit mode

Yeah, it might be a bit cleaner to start a new thread.

ADD REPLY

Login before adding your answer.

Traffic: 2555 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6