Question

Difference between Seurat standard integration workflow and Seurat reference-based integration

2

Entering edit mode

3.8 years ago

mvis1231 ▴ 120

Hi, I did follow Seurat standard integration workflow for my previous single cell RNA-seq analysis (https://satijalab.org/seurat/v3.1/immune_alignment.html) and I am currently performing a reference-based integration analysis, following the "Reference-based" tab in the Seurat vignette of integration and label transfer (https://satijalab.org/seurat/v3.0/integration.html).

But, while performing a reference-based integration, I am a bit confused in understanding the benefit of reference-based integration. I know the vignette explains it reduces the computational time by setting multiple datasets as a reference, because it ultimately reduces the number of comparisons between reference and query datasets to find anchors for CCA.

Is there any other benefits of reference-based integration, other than the computational efficiency?

As far as I understand, we are still able to map the query dataset to the reference dataset by following the standard integration workflow, because, by doing the standard integration, we can perform comparative analyses on cell types.

For example, if I have dataset A as the data of interest and dataset B.1, dataset B.2, and dataset B.3 as data from our lab, I can still see where the dataset A maps to the three of dataset B by performing the standard integration analysis, though it requires 6 pairwise comparisons. If I want to do a reference-based integration, it only performs 2 comparisons simply between dataset A and dataset B, because we set the three of dataset B as one reference set. Still, other than we call the three dataset B as a reference and dataset A as a query dataset, there wouldn't be much difference in comparing cell types.

In this case when we can see where dataset A maps to dataset B in both ways, what's the difference in interpretation of the mapping results between the two methods? The only difference I can see here is we just call the three dataset B as a reference in a reference-based integration and it increases the computational efficiency.

I understand the developer mentioned in the vignette that a reference-based integration gives a very similar result as the standard integration. I just want to make sure if there is any significant benefits other than computational efficiency in using a reference-based integration than the standard integration, so that I can apply a right method properly.

Could anyone help me understand the significant difference between the two integration methods? I really do appreciate your time and help. Thank you.

Seurat scRNA-seq Integration reference-based • 1.5k views

ADD COMMENT • link 3.8 years ago by mvis1231 ▴ 120