Clustering Cells using known marker genes (Single cell RNAseq)
2
2
Entering edit mode
6.8 years ago
V ▴ 380

Hello,

I've got my a dataset of single cells that were sequenced and generated the associated count files etc.

Up until know i've been using the Seurat package in R that is amazing at clustering cells itself (unsupervised), and it will then give you the genes differentially expressed between clusters A vs B etc.

One question I have though and can't find the solution is how to do "supervised" clustering (??) Basically I've got these cells that are for example Pax3+/CD146+ and these other cells that are Pax3+/CD146-. And these cells using the tSNE plot I can see fall in different clusters when I do conventional clustering (together with other unrelated cells). Does anyone know of a way that I can cluster all of the cells I want together in two different clusters (Pax3+ / CD146- & +) and then run differential expression testing (or even just get the gene lists) of those?

Thanks!

single cell rnaseq • 4.7k views
ADD COMMENT
4
Entering edit mode
6.8 years ago

Supervised "clustering" is usually called classification in machine learning while the term clustering is typically reserved for unsupervised approaches. Maybe this clarification of terms will help you find the relevant literature.
The way to do supervised learning, is to use a training set, i.e. a data set for which you know the ground truth and use it to "train" an algorithm to learn how to classify the data. How to do this more precisely depends on the type of data you have and the algorithm you want to use. There are a few things to pay attention to. For example, if you only train with two classes, all samples will end up into one or the other. If the "unrelated cells" are the problem you would need more classes and have ground truth data for all of them.
If getting ground truth data is an issue, you could also try refining the clustering (maybe using other clustering methods) so that you get more clusters that are "purer".

ADD COMMENT
1
Entering edit mode

Thank you for clarifying the terminology. Sometimes having the correct term can save hours of random searching!

ADD REPLY
2
Entering edit mode
6.2 years ago
PR ▴ 50

Not sure if my answer will still be useful to you, as the post is pretty old. I just bumped onto your question now. I'm guessing you are using the FindMarkers function in Seurat to call DE. You can make this function call differential expression between any two subgroups of cells by first assigning new subgroup identifiers to the cells using the AddMetaData function, and then formally making those subgroup identifiers the default cell "idents" using the "SetAllIdent" function. Then, you can assign the new "idents" to "ident.1" and "ident.2" parameters in the FindMarkers function. Hope this works! If you have found another way, please post that too as a reply. Good luck.

ADD COMMENT

Login before adding your answer.

Traffic: 3232 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6