Question

Connecting clusters of cells in 2 adjacent time points.

1

Entering edit mode

5.5 years ago

zizigolu ★ 4.3k

Hi, I am doing single cell RNA-seq I have 9 time points and roughly 200 cells in each time point; There is no a control-treatment assay rather I am working with a developmental process. Cells from a growing unicellular mold (Time point 0) are being starved and single cell sequencing on cells harvested each 2 hours onward has been done. Now, I have a matrix; columns are my cells and rows are genes (Until here I described the basics).

I have clustered cells in each time point by Seurat R package that gave me roughly 2-3 clusters of cells for each time point. I have done differential expression between cells in each time point to obtain marker genes specific to each cluster of cells. Now, I have to find the similarity between clusters between time points; I mean, for instance, if I have clusters a, b and c for hour point 2 and cluster a', b' and c' for hour point 4, what is the relationship between these clusters (similarity, parent_child)? I have tried some algorithm like URD that try to connect cells by arranging them in a pseudotime manner afterward making a tree of related cells (lineage). However they don't take into account the fine clustering within each time point (only care about start and end time points).

This matlab algorithm

https://www.dropbox.com/sh/zn9b5xgssmkhnqa/AACJucOyiLcs-1WOmwerQyf3a/Subroutines?dl=0&preview=get_parent_child_map.m&subfolder_nav_tracking=1

tries to connect cluster of cells in 2 adjacent time points to each other in a parent-child way (later and earlier time points). Here as a control to see if I am running that properly, I put 3 clusters of cells from one time point and tried to connect clusters to each other; For example if I have clusters a, b and c, I expect a be more similar to a, b to b and c to c (as I am comparing one time point to itself); But what I am obtaining is not revealing as this picture.

As you are seeing in first column a is the most dissimilar to c but in third column c is not the most dissimilar to a anymore. Here, likely the number of similar cells in each clusters have been devided to sum of the column based on this lines of code from the source

if column_normalize==1 % Column normalize

    for i = 1:size(raw_vote,2)

        a = raw_vote(:,i);
        b = a/sum(a);
        [sorted_b,sortingIndices] = sort(b,'descend');

        assignment_probabilities = [assignment_probabilities b];
        parent_assignments(i) = sortingIndices(1); % parent for child cluster i

    end

Whatever I am reading this code I don't know how to interpret this picture. I ask the developer, he sent me his sample inputs files to reproduce the results https://www.dropbox.com/sh/8856ij1nlk6ehiq/AADS0CjwfTxmlBpmGMDSxtWRa?dl=0

but did not help me to get the point.

Now, I thought about doing something in R; If I have some marker genes for each cluster in each time point, by counting common marker genes between clusters in 2 time points I can say which cluster is more likely similar to another. I have done that by mapping markers genes from one time point on another time points as a heatmap like this

But this heatmap is not accurate;

Assuming the marker genes in each cluster as a gene module and trying to connect them to another gene module by weighting similarity matrix and visualising that by igraph (I know this is a very naive thinking of the solution). Calculate the weighted overlap between pairs of gene modules in adjacent stages

from this tutorial,

https://github.com/farrellja/URD/blob/master/Analyses/SupplementaryAnalysis/URD-10-ConnectModulesBetweenStages.Rmd

The result could look

enter image description here

or

enter image description here

R igrap adjacency matrix seurat URD • 1.7k views

ADD COMMENT • link 5.5 years ago by zizigolu ★ 4.3k

0

Entering edit mode

tagging: Jean-Karim Heriche

ADD REPLY • link 5.5 years ago by GenoMax 141k

score 2 · Answer 1 · 2018-10-10

Hey, you mean something like this:

That was built using Reingold-Tilford layout in igraph. Vertex size and shade are proportional to expression of each gene. Edge thickness is based on weight, which, here, is based on Pearson correlation. Here is the simple code:

g <- graph.adjacency(as.matrix(dist(WNT)), mode="undirected", weighted=TRUE, diag=FALSE)

g <- simplify(g, remove.multiple=TRUE, remove.loops=TRUE)

V(g)$name <- V(g)$name

V(g)$shape <- "sphere"

V(g)$vertex.frame.color <- "white"

E(g)$color <- "grey"

E(g)$arrow.size <- 1.0

mst <- as.undirected(minimum.spanning.tree(g, algorithm="prim"))

edgeweights <- E(mst)$weight * 3

plot.igraph(mst,
          layout=layout.reingold.tilford,
          edge.curved=TRUE,
          vertex.size=vSizes,
          vertex.label.dist=-1,
          vertex.label.color="black",
          asp=FALSE,
          vertex.label.cex=1.0,
          edge.width=edgeweights,
          edge.arrow.mode=0, main="Title")

-------------------------------------------

I looked at the methods in the Farrel supplementary, and it looks like that would be possible.

I have not worked much with scRNA-seq, but I developed my own method for single-cell CyTOF data. There, it is possible to plot cellular 'lineages' and then compare them. Actually, statistical methods for the 'comparing' part are still being developed. Although I have some ideas about how to do this, I have not yet implemented them, but I believe others have.

If you look at Step 9, create a network plot of the clusters from This, you'll see how I plot out a lineage based on immune cell expression and using Fruchterman-Reingold. I also trim edges that are below a certain threshold. It would also be possible to implement the 'random walks' part via this function: http://igraph.org/r/doc/random_walk.html

Does that help at all?

Note that some of this was also mention in this tutorial: Network plot from expression data in R using igraph

Kevin