Error using feature counts with DESeq2 : Cannot create DESeqData object
0
1
Entering edit mode
5.0 years ago
VBer ▴ 200

I have a file of read counts with which I want to find the differentially expressed genes from using DeSeq2. The file was generated using feature counts.

           C11  C15 C19 C23 N11 N15 N19 N23

NM_000014   4422    14216   8885    17031   8162    4811    12536   8273
NM_000015   3         0 7   2   0   9   2   6
NM_000016   1063    1192    1608    1345    1118    951 943 1120
NM_000017   164     424 463 507 603 692 494 653
NM_000018   5193    12982   11382   11716   10030   14180   9379    13316
NM_000019   654     1103    1106    1184    743 497 569 844

When I try to create a DeSeq2 object using the DESeqDataSetFromMatrix I get the following error:

DESeq.ds <- DESeqDataSetFromMatrix(countData = readcounts,colData = sample_info,design = ~ condition)
Error in DESeqDataSet(se, design = design, ignoreRank): some values in assay are not integers
Traceback:

1. DESeqDataSetFromMatrix(countData = readcounts, colData = sample_info,  design = ~condition)
2. DESeqDataSet(se, design = design, ignoreRank)
3. stop("some values in assay are not integers")

I checked the entire read counts file, there are no integers in it, so I don't understand why this error keeps occuring. I tried running the sapply(readcounts, class) command as suggested in this thread (which did not give a clear solution) and get the following output:

C11 'numeric'
C15 'numeric'
C19'numeric'
C23 'numeric'
N11 'numeric'
N15 'numeric'
N19 'numeric'
N23 'numeric'

I tried using DESeqDataSet instead, but that requires a RangedSummarizedExperiment object from the function summarizeOverlaps from the Genomic Alignments package. The summarizeOverlaps function does the same job as featurecounts - generate read counts. I don't want to repeat that step.

RNA-Seq DESeq2 featurecounts • 6.4k views
ADD COMMENT
0
Entering edit mode

How did you obtain the count matrix? Are these raw (=non-normalized) counts? Please show head(readcounts).

You can also try as.integer(readcounts) given that there are indeed integers but they are somewhat misclassified as characters of fators. How did you import the count matrix?

ADD REPLY
0
Entering edit mode

The file was given to me by my professor as a csv file. Yes these are raw counts. I imported the file using the read.table command. as.integer(readcounts) doesn't work because readcounts is a dataframe.

head(readcounts)
C11 C15 C19 C23 N11 N15 N19 N23
NM_000014   4422    14216   8885    17031   8162    4811    12536   8273
NM_000015   3   0   7   2   0   9   2   6
NM_000016   1063    1192    1608    1345    1118    951 943 1120
NM_000017   164     424 463 507 603 692 494 653
NM_000018   5193    12982   11382   11716   10030   14180   9379    13316
NM_000019   654     1103    1106    1184    743 497 569 844
ADD REPLY
0
Entering edit mode

Come on, as.integer(as.matrix(readcounts))

ADD REPLY
0
Entering edit mode

Ah sorry XP New to R.

Edit: Sorry it worked. Should I convert the integer object back to a data frame? Again, sorry if the question is too obvious.

readcounts <- as.integer(as.matrix(readcounts))
head(readcounts)
4422 3 1063 164 5193 654 
DESeq.ds <- DESeqDataSetFromMatrix(countData = readcounts,colData = sample_info,design = ~ condition)
Error in validObject(.Object): invalid class “SummarizedExperiment” object: nb of cols in 'assay' (1) must equal nb of rows in 'colData' (8)
ADD REPLY
0
Entering edit mode

Ah sorry XP New to R.

No problem, sorry did not intend to sound harsh :)

mode(readcounts) <- "integer" is the last thing I could think of.

ADD REPLY
0
Entering edit mode

Have you tried importing just the top of the file? Maybe there is one line that got corrupted. Is dim(readcounts) what you expect? You might have a weird whitespace hiding in there somewhere.

ADD REPLY
0
Entering edit mode
tmp <- gsub(" ", "", readcount); mode(tmp) <- "integer"

might be worth a try

ADD REPLY
0
Entering edit mode

Hey, thank you SO MUCH for asking me to check the dimensions of my data frame. I was indeed missing around 8000 genes. I figured out the error; I was using this code to make my Gene_IDs unique and convert them to row names:

readcounts <- aggregate(readcounts[-1],readcounts[1],mean)
row.names(readcounts) <- readcounts[,1]
readcounts$Gene_ID <- NULL
head(readcounts)

That block of code was somehow chopping off the last 8k of my genes. I also figured that it was adding some non-integer values. So I replaced the code with this instead:

rownames(readcounts) = make.names(readcounts$Gene_ID, unique=TRUE) #To make unique rownames
readcounts$Gene_ID <- NULL
head(readcounts)
                   C11  C15 C19 C23 N11 N15 N19 N23
NR_075077   0   5   0   4   8   2   2   6
NM_001276352    0   5   0   4   8   2   2   6
NM_001276351    0   5   0   4   8   2   2   6
NM_000299   1220    5980    5089    2792    5223    9731    4365    4755
NM_001005337    1220    5980    5089    2792    5223    9731    4365    4755
NM_012102   7436    15741   13205   14024   15995   14659   9167    12504

I found that bit of magic here. I was able to create a DESeqObject now. Thanks swbarnes and ATpoint!

ADD REPLY

Login before adding your answer.

Traffic: 2496 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6