Biostar Beta. Not for public use.
R: converting Ensembl row names to Symbol ID outputs missing values in 'row.names' are not allowed
0
Entering edit mode
14 months ago
user31888 • 60
United States

I have a .csv file as follows:

,TEST1,TEST2
ENSG00000197421,2,0
ENSG00000213753,0,2
ENSG00000168746,0,2
ENSG00000261824,3,0
ENSG00000128310,1,2
ENSG00000235091,9,4

In R, I import the file like this:

 > d <- read.csv("my_file.csv", header=TRUE, row.names=1)
 > d
                TEST1 TEST2
ENSG00000197421     2     0
ENSG00000213753     0     2
ENSG00000168746     0     2
ENSG00000261824     3     0
ENSG00000128310     1     2
ENSG00000235091     9     4

Checking that I do not have any duplicates:

> rownames(d)
[1] "ENSG00000197421" "ENSG00000213753" "ENSG00000168746" "ENSG00000261824"
[5] "ENSG00000128310" "ENSG00000235091"
> colnames(d)
[1] "TEST1" "TEST2"
> any(duplicated(rownames(d)))
[1] FALSE
> any(duplicated(colnames(d)))
[1] FALSE

Load libraries:

> suppressMessages(library("AnnotationDbi"))
> suppressMessages(library("org.Hs.eg.db"))

Then try to convert my Ensembl row names to Symbol in place:

> rownames(d) <- mapIds(org.Hs.eg.db,keys=rownames(d),column="SYMBOL",keytype="ENSEMBL",multiVals="first")
Error in `row.names<-.data.frame`(`*tmp*`, value = value) : 
  missing values in 'row.names' are not allowed

NOTE: Removing the first ',' on 'my_file.csv' did not help neither.

I managed to create a new field with the converted IDs but cannot replace it to the row names:

> d$SYMBOL <- mapIds(org.Hs.eg.db,keys=rownames(d),column="SYMBOL",keytype="ENSEMBL",multiVals="first")
> d
                TEST1 TEST2    SYMBOL
ENSG00000197421     2     0     GGT3P
ENSG00000213753     0     2 CENPBD1P1
ENSG00000168746     0     2 LINC01620
ENSG00000261824     3     0 LINC00662
ENSG00000128310     1     2     GALR3
ENSG00000235091     9     4      <NA>
> d_subset <- subset(d, !is.na(d$SYMBOL))
> d_subset
                TEST1 TEST2    SYMBOL
ENSG00000197421     2     0     GGT3P
ENSG00000213753     0     2 CENPBD1P1
ENSG00000168746     0     2 LINC01620
ENSG00000261824     3     0 LINC00662
ENSG00000128310     1     2     GALR3
> rownames(d) <- d$SYMBOL
Error in `row.names<-.data.frame`(`*tmp*`, value = value) : 
  missing values in 'row.names' are not allowed

I don't get it.

R • 3.6k views
ADD COMMENTlink
0
Entering edit mode

Here, missing values means NAs, which can not be used as row names. You need to convert them to unique names (because duplicate row names are not allowed).

ADD REPLYlink
3
Entering edit mode
13 months ago
seancho • 40

In your last line, you're still trying to assign rownames(d) <- d$SYMBOL, and not your new d_subset.

rownames(d_subset) <- d_subset$SYMBOL should work.

Alternatively, if you wish to keep all the entries, you could retain the Ensembl names when it is not mapped:

> rownames(d) <- ifelseis.na(d$SYMBOL), rownames(d), d$SYMBOL)

> d
                TEST1 TEST2    SYMBOL
GGT3P               2     0     GGT3P
CENPBD1P1           0     2 CENPBD1P1
LINC01620           0     2 LINC01620
LINC00662           3     0 LINC00662
GALR3               1     2     GALR3
ENSG00000235091     9     4      <NA>
ADD COMMENTlink
0
Entering edit mode

+1 for keeping the ENSG. Thanks !

ADD REPLYlink
0
Entering edit mode

Missing a bracket. The editor does not want it...

ADD REPLYlink
0
Entering edit mode
14 months ago
user31888 • 60
United States

Sorry, but I don't see any missing values in my dataset. And I don't see any duplicates in any field neither. That's what I don't understand.

ADD COMMENTlink
0
Entering edit mode
ENSG00000235091     9     4      <NA>

Here, <NA> means that you have a missing value.

ADD REPLYlink
0
Entering edit mode

Actually, I forgot to use the subset of my data frame in my last piece of code

ADD REPLYlink
0
Entering edit mode
14 months ago
user31888 • 60
United States

Sorry, but I don't see any miising values in my dataset. That's what I don't understand.

ADD COMMENTlink
0
Entering edit mode
3.2 years ago
Liun • 30
Harbin

It's because not all your keys (rownames(d)) in org.Hs.eg.db.

> rownames(d)

"ENSG00000197421" "ENSG00000213753" "ENSG00000168746" "ENSG00000261824" "ENSG00000128310" "ENSG00000235091"

> intersect(rownames(d),keys(org.Hs.eg.db,"ENSEMBL"))

"ENSG00000197421" "ENSG00000213753" "ENSG00000168746" "ENSG00000261824" "ENSG00000128310"

if you run this code :

mapIds(org.Hs.eg.db,keys=rownames(d)[1:5],column="SYMBOL",keytype="ENSEMBL",multiVals="first")

it's ok without any error.

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1