Question: Data integration strategies for single cell RNA-seq
0
Entering edit mode

Hi, I was writing to see if anyone has experience combining single cell RNA-seq data from different conditions and biological/technical replicates in an experiment. I have a dataset with two different conditions (WT/Treatment) and each condition has two different replicates (done in different isogenic mice). I would like to correct for batch for each of the conditions independently and then combine the dataset from the two conditions and do a joint analysis to see the difference in clusters/cell types between the two conditions. Generally I have been using Seurat in which I tried the following strategy:
For ex, COND1 had Exp1.1 and Exp1.2 and COND2 had Exp2.1 and Exp2.2.

The process I followed is:

  • merge COND1/Exp1.1 and COND2/Exp1.2
  • after the usual pre-processing of the merged object for COND1, correct for batch in ScaleData using the expt id.
  • Do the same for COND2
  • then merge the two objects - COND1 and COND2 for a combined analysis.

The problem is that on merging COND1 and COND2 in the last step I have normalize and ScaleData again which would lose the batch corrected expression values. If I merge all the conditions and experiments in the beginning then I don't think I could correct for batch across all datasets since that would neutralize the difference between the conditions.

Any thoughts/suggestions would be greatly appreciated. If someone can point me to any code that does this, even better!

Thanks,

  • Pankaj
ADD COMMENTlinkeditmoderate 10 months ago bioinformatics.cancer • 180 • updated 10 months ago genomax 68k
Entering edit mode
0

Did you try out the alignment procedure that Butler et al. described? I believe, this vignette might be appropriately similar to your experimental set up to follow along.

In short, I would suggest you first match all the samples to see if there are great differences between the conditions. Depending on the specific questions you're addressing, you may find yourself processing the data differently each time, though.

ADD REPLYlinkeditmoderate 10 months ago
Friederike
4.2k
Entering edit mode
0

Thanks for the suggestion. Yes, I have looked into the vignette but I don't believe that one is similar to the situation I described. For the alignment vignette, the starting condition is the same except that the stimulation is expected to give difference in gene expression rather than yield very different cell types. I may be wrong, but at least that is the way understood it. In my case, there is a treatment after which the tumor is harvested after many days to understand the treatment effect. The cell type abundance is expected to be very different between the control and treatment groups.

ADD REPLYlinkeditmoderate 10 months ago
bioinformatics.cancer
• 180
Entering edit mode
0

If you want to map cells from different conditions to the same clusters, the alignment step is certainly the most elegant one. Otherwise, you should keep the pre- and post-treatment samples separate, determine the clusters and see if you know which clusters from pre correspond to which clusters in the post samples, at least if I understand your aim correctly.

ADD REPLYlinkeditmoderate 10 months ago
Friederike
4.2k

Login before adding your answer.

Powered by the version 2.0