Question

kNN algorithm on microarray data - training labels definition

0

Entering edit mode

6.9 years ago

arronar ▴ 280

Hello.

I have a bunch of migroarray data in an array and i wanna run the kNN algorithm on them. So for simplicity let's say that I have for 100 genes in 4 different treatments a table with expression levels.

| Gene name | Treat 1 | Treat 2 | Treat 3 | Treat 4|
----------------------------------------------------
|   Gene 1  |   0.343 |   0.343 |   0.343 |   4.533 |
|   Gene 2  |   0.353 |   1.343 |   0.443 |   0.343 |
|   Gene 3  |   0.343 |   0.335 |   0.343 |   0.343 |
|   ...     |   ...   |   ...   |   ...   |   ...   |
| Gene 100  |   5.343 |   0.323 |   0.343 |   0.243 |

I will use the 70% for the training set and the 30% for the testing set.

train_set = data[1:70,]
test_set = data[71:100,]

I also have to create a vector with the labels of the training set.

train_labels = c("Treat 1", "Treat 2", "Treat 3", "Treat 4")

and then run the knn()

knn(train = train_set, test = test_set,cl = train_labels, k=10)

The think is that training labels are only 4 while the training set is consisted of 70 rows and I think that this is going to produce error.

Which is the right way to approach it ? Should I transpose my initial matrix ?

Thank you

R microarray kNN • 1.6k views

ADD COMMENT • link updated 3.7 years ago by Biostar 20 • written 6.9 years ago by arronar ▴ 280

score 0 · Answer 1 · 2017-05-21

0

Entering edit mode

6.9 years ago

Jean-Karim Heriche 27k

You're not telling us what you're using but I assume this is the knn() function from the R package class. If so cl should get a factor with one entry for each instance of the training set, i.e. in your case train_labels should be a factor of length 70.

ADD COMMENT • link 6.9 years ago by Jean-Karim Heriche 27k