kNN algorithm on microarray data - training labels definition
1
0
Entering edit mode
6.9 years ago
arronar ▴ 280

Hello.

I have a bunch of migroarray data in an array and i wanna run the kNN algorithm on them. So for simplicity let's say that I have for 100 genes in 4 different treatments a table with expression levels.

| Gene name | Treat 1 | Treat 2 | Treat 3 | Treat 4|
----------------------------------------------------
|   Gene 1  |   0.343 |   0.343 |   0.343 |   4.533 |
|   Gene 2  |   0.353 |   1.343 |   0.443 |   0.343 |
|   Gene 3  |   0.343 |   0.335 |   0.343 |   0.343 |
|   ...     |   ...   |   ...   |   ...   |   ...   |
| Gene 100  |   5.343 |   0.323 |   0.343 |   0.243 |

I will use the 70% for the training set and the 30% for the testing set.

train_set = data[1:70,]
test_set = data[71:100,]

I also have to create a vector with the labels of the training set.

train_labels = c("Treat 1", "Treat 2", "Treat 3", "Treat 4")

and then run the knn()

knn(train = train_set, test = test_set,cl = train_labels, k=10)

The think is that training labels are only 4 while the training set is consisted of 70 rows and I think that this is going to produce error.

Which is the right way to approach it ? Should I transpose my initial matrix ?

Thank you

R microarray kNN • 1.6k views
ADD COMMENT
0
Entering edit mode
6.9 years ago

You're not telling us what you're using but I assume this is the knn() function from the R package class. If so cl should get a factor with one entry for each instance of the training set, i.e. in your case train_labels should be a factor of length 70.

ADD COMMENT

Login before adding your answer.

Traffic: 1951 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6