R: converting Ensembl row names to Symbol ID outputs missing values in 'row.names' are not allowed
4
1
Entering edit mode
7.1 years ago
user31888 ▴ 130

I have a .csv file as follows:

,TEST1,TEST2
ENSG00000197421,2,0
ENSG00000213753,0,2
ENSG00000168746,0,2
ENSG00000261824,3,0
ENSG00000128310,1,2
ENSG00000235091,9,4

In R, I import the file like this:

 > d <- read.csv("my_file.csv", header=TRUE, row.names=1)
 > d
                TEST1 TEST2
ENSG00000197421     2     0
ENSG00000213753     0     2
ENSG00000168746     0     2
ENSG00000261824     3     0
ENSG00000128310     1     2
ENSG00000235091     9     4

Checking that I do not have any duplicates:

> rownames(d)
[1] "ENSG00000197421" "ENSG00000213753" "ENSG00000168746" "ENSG00000261824"
[5] "ENSG00000128310" "ENSG00000235091"
> colnames(d)
[1] "TEST1" "TEST2"
> any(duplicated(rownames(d)))
[1] FALSE
> any(duplicated(colnames(d)))
[1] FALSE

Load libraries:

> suppressMessages(library("AnnotationDbi"))
> suppressMessages(library("org.Hs.eg.db"))

Then try to convert my Ensembl row names to Symbol in place:

> rownames(d) <- mapIds(org.Hs.eg.db,keys=rownames(d),column="SYMBOL",keytype="ENSEMBL",multiVals="first")
Error in `row.names<-.data.frame`(`*tmp*`, value = value) : 
  missing values in 'row.names' are not allowed

NOTE: Removing the first ',' on 'my_file.csv' did not help neither.

I managed to create a new field with the converted IDs but cannot replace it to the row names:

> d$SYMBOL <- mapIds(org.Hs.eg.db,keys=rownames(d),column="SYMBOL",keytype="ENSEMBL",multiVals="first")
> d
                TEST1 TEST2    SYMBOL
ENSG00000197421     2     0     GGT3P
ENSG00000213753     0     2 CENPBD1P1
ENSG00000168746     0     2 LINC01620
ENSG00000261824     3     0 LINC00662
ENSG00000128310     1     2     GALR3
ENSG00000235091     9     4      <NA>
> d_subset <- subset(d, !is.na(d$SYMBOL))
> d_subset
                TEST1 TEST2    SYMBOL
ENSG00000197421     2     0     GGT3P
ENSG00000213753     0     2 CENPBD1P1
ENSG00000168746     0     2 LINC01620
ENSG00000261824     3     0 LINC00662
ENSG00000128310     1     2     GALR3
> rownames(d) <- d$SYMBOL
Error in `row.names<-.data.frame`(`*tmp*`, value = value) : 
  missing values in 'row.names' are not allowed

I don't get it.

R • 11k views
ADD COMMENT
0
Entering edit mode

Here, missing values means NAs, which can not be used as row names. You need to convert them to unique names (because duplicate row names are not allowed).

ADD REPLY
6
Entering edit mode
7.1 years ago
seancho ▴ 90

In your last line, you're still trying to assign rownames(d) <- d$SYMBOL, and not your new d_subset.

rownames(d_subset) <- d_subset$SYMBOL should work.

Alternatively, if you wish to keep all the entries, you could retain the Ensembl names when it is not mapped:

> rownames(d) <- ifelseis.na(d$SYMBOL), rownames(d), d$SYMBOL)

> d
                TEST1 TEST2    SYMBOL
GGT3P               2     0     GGT3P
CENPBD1P1           0     2 CENPBD1P1
LINC01620           0     2 LINC01620
LINC00662           3     0 LINC00662
GALR3               1     2     GALR3
ENSG00000235091     9     4      <NA>
ADD COMMENT
0
Entering edit mode

+1 for keeping the ENSG. Thanks !

ADD REPLY
0
Entering edit mode

Missing a bracket. The editor does not want it...

ADD REPLY
0
Entering edit mode
7.1 years ago
user31888 ▴ 130

Sorry, but I don't see any missing values in my dataset. And I don't see any duplicates in any field neither. That's what I don't understand.

ADD COMMENT
0
Entering edit mode
ENSG00000235091     9     4      <NA>

Here, <NA> means that you have a missing value.

ADD REPLY
0
Entering edit mode

Actually, I forgot to use the subset of my data frame in my last piece of code

ADD REPLY
0
Entering edit mode
7.1 years ago
user31888 ▴ 130

Sorry, but I don't see any miising values in my dataset. That's what I don't understand.

ADD COMMENT
0
Entering edit mode
7.1 years ago
Liun ▴ 30

It's because not all your keys (rownames(d)) in org.Hs.eg.db.

> rownames(d)

"ENSG00000197421" "ENSG00000213753" "ENSG00000168746" "ENSG00000261824" "ENSG00000128310" "ENSG00000235091"

> intersect(rownames(d),keys(org.Hs.eg.db,"ENSEMBL"))

"ENSG00000197421" "ENSG00000213753" "ENSG00000168746" "ENSG00000261824" "ENSG00000128310"

if you run this code :

mapIds(org.Hs.eg.db,keys=rownames(d)[1:5],column="SYMBOL",keytype="ENSEMBL",multiVals="first")

it's ok without any error.

ADD COMMENT

Login before adding your answer.

Traffic: 2668 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6