Question

How to reproduce MATLAB tsne('algorithm', 'exact') result in R?

1

Entering edit mode

5.6 years ago

mk ▴ 290

Given:

a 25x25 matrix of integers
an initial 25x3 embedding, generated by PCA
perplexity 2
random seed set to 1
target embedding dimension 3

Run exact t-SNE using MATLAB tsne('algorithm', 'exact') and R Rtsne(theta = 0)

Here is the data "A":

0 0 0 0 0 97 0 0 0 0 0 0 0 93 67 0 0 0 24 63 0 81 69 0 63
0 35 18 12 0 36 0 89 0 15 23 69 0 54 56 36 0 0 0 90 0 0 37 0 12
0 31 17 17 64 80 0 0 0 0 0 0 23 0 0 0 0 69 83 78 94 0 0 93 40
0 0 0 0 0 0 0 0 74 76 0 0 70 31 12 60 92 0 99 16 53 19 0 3 0
0 0 7 85 84 0 0 0 0 0 0 0 0 0 0 0 0 0 64 0 0 0 0 4 33
35 0 77 0 52 0 0 0 0 0 64 70 0 5 0 0 48 93 0 0 92 0 0 2 0
0 0 0 0 0 25 58 0 0 0 46 0 0 88 0 79 0 60 0 23 0 0 81 33 62
5 91 65 0 0 0 0 0 38 72 0 0 75 0 0 0 0 21 48 0 0 32 0 0 0
14 0 40 0 35 0 0 81 94 51 21 55 43 0 0 30 0 77 56 0 0 0 85 0 4
9 0 0 0 0 0 0 0 8 0 86 36 98 0 0 64 0 87 0 0 32 0 88 7 23
0 96 100 0 0 55 0 0 0 0 73 0 0 0 0 0 68 51 0 0 81 0 92 0 0
59 0 93 0 75 12 0 22 0 0 0 0 13 67 0 0 0 67 0 0 71 82 0 0 0
0 0 79 16 35 0 0 0 99 84 89 0 0 26 0 99 0 8 65 81 77 97 0 13 0
0 0 7 97 0 0 0 63 51 29 0 0 0 0 39 38 44 0 0 0 23 0 18 79 76
0 50 0 9 31 0 0 0 57 0 0 61 0 0 0 0 0 95 0 82 35 0 38 85 0
12 0 0 0 0 0 64 0 38 80 0 43 0 26 0 0 0 0 0 0 73 17 39 7 93
25 0 29 0 0 0 31 0 0 73 0 0 0 0 0 8 0 98 0 0 66 0 0 0 61
89 0 29 33 65 72 0 0 18 60 0 0 0 0 63 0 36 0 0 0 0 0 0 0 61
0 0 0 0 22 91 0 0 0 0 49 0 0 0 0 54 7 0 0 0 0 0 0 50 0
0 0 82 51 3 0 0 74 0 0 0 100 57 0 83 0 0 0 0 0 0 0 94 89 0
0 0 65 0 23 33 0 13 5 90 0 0 0 0 0 0 0 81 0 10 0 0 0 5 5
0 0 0 0 56 0 0 0 30 0 0 98 0 78 0 63 0 0 12 42 11 0 0 0 0
0 0 0 72 41 0 0 0 0 53 0 0 19 1 0 0 63 0 0 0 0 11 0 15 0
0 0 4 0 0 31 14 0 0 0 0 85 0 100 0 0 0 70 16 30 98 0 31 0 0
0 0 0 38 0 71 0 85 0 0 53 0 87 0 0 51 59 0 0 0 0 0 0 0 24

Here is the initial embedding "Y":

0.254 -0.440 -0.402
0.131 0.095 0.282
0.331 -0.166 0.242
-0.661 -0.449 -0.110
0.256 -0.522 -0.204
-0.126 0.079 -0.333
0.233 0.314 -0.400
-0.653 0.153 0.194
0.099 -0.149 0.602
0.192 -0.492 0.206
0.178 0.286 0.478
-0.092 0.541 0.012
-0.312 -0.013 0.587
0.252 0.505 -0.313
-0.603 0.079 -0.297
-0.059 0.255 0.493
-0.211 -0.412 0.274
0.569 0.253 0.011
0.158 -0.291 0.471
0.146 0.386 0.058
0.684 0.052 0.038
0.374 -0.120 0.110
-0.217 0.691 0.078
-0.367 0.142 -0.042
-0.155 -0.025 -0.597

Now generate a t-SNE embedding using R's Rtsne(). Notice that "theta = 0" gives the exact algorithm rather than Barnes-Hut:

M = Rtsne(A, perplexity = 1, Y_init = Y, k = 3, max_iter = 1000, dims = 3, pca = FALSE)

18.3306291   -6.2506041 -62.3268525
15.9386942   -3.1369803 -60.5240895
-55.0534150 -176.3234448 -26.9564002
2.8059395  -26.5137681  60.3847045
36.8003669  116.2724964 -44.7099789
-41.8265624   82.2428220   2.4331879
25.7039137  -31.0979402  47.7070508
6.7846280  -24.5332260  59.7495021
17.6887619  -26.3297157  53.9600569
23.9006623  -30.0315023  49.1130745
-43.6656649   83.6832572   5.8279631
-40.6721409   81.3381834   0.2965082
-0.7597657  -28.2349362  60.9845080
32.4991289  120.2894942 -47.4589986
-57.5001125 -176.7817773 -26.9090314
14.4104082  -15.9172413  61.2646084
12.6062537  -19.6219765  59.9587861
39.8651878  119.5524683 -49.4049012
-52.8444181  -65.8360785   8.1726297
12.0636781   -2.5291677 -62.3328198
11.2964251  -21.9339896  59.2367094    
17.6280749   -0.2864139 -56.4086955
36.4835826  118.1396018 -46.6151879
18.4594930    1.1163553 -54.3799843
-50.9437484  -67.2759159   8.9376503

Now we generate an embedding using MATLAB:

M = tsne(A,'Algorithm','exact','NumPCAComponents',0, 'Perplexity', 1, 'InitialY', Y, 'NumDimensions', 3)

1.0e+03 *

0.1524    0.3834   -0.1735
0.1637    0.3731   -0.1781
-0.8293    0.3517    0.1845
0.3783   -0.0525    0.3495
-0.0068   -0.2998   -0.6433
-0.3656   -0.2723    0.1051
0.4675   -0.0788    0.3194
0.3858   -0.0658    0.3431
0.4267   -0.0795    0.3288
0.4579   -0.0790    0.3216
-0.3665   -0.2741    0.0900
-0.3651   -0.2712    0.1151
0.3713   -0.0407    0.3552
-0.0123   -0.2780   -0.6524
-0.8293    0.3517    0.1942
0.3862   -0.1062    0.3291
0.3909   -0.0915    0.3330
0.0137   -0.2877   -0.6476
-1.0500   -0.2379    0.0832
0.1678    0.3622   -0.1670
0.3933   -0.0817    0.3359
0.1718    0.3737   -0.1953
-0.0029   -0.2909   -0.6468
0.1761    0.3740   -0.2044
-1.0499   -0.2378    0.0930

Clearly these embeddings are not equivalent. Given an initial embedding, t-SNE should be repeatable. What am I overlooking?

tsne pca R Rtsne matlab • 2.4k views

ADD COMMENT • link 5.6 years ago by mk ▴ 290

2

Entering edit mode

t-SNE (t-Distributed Stochastic Neighbor Embedding) starts with random initialization that is why each time when you run the algorithm, you get different results (not exactly same, slightly different). In order to reproduce the results in R, you can use set.seed(x) where x is numeric value. I think it is really hard to get the same results between different programming softwares.

ADD REPLY • link 5.6 years ago by arta ▴ 670