Warning message: In y/gene.length.kb : longer object length is not a multiple of shorter object length? Any ideas?
2
1
Entering edit mode
8.0 years ago
tud55122 ▴ 20

Hi,

I'm new to RNA-seq analysis. I'm using EdgeR to generate RPKMs.Everything works fine but the last step, there is a warning message saying that:

Warning message:

In y/gene.length.kb :
  longer object length is not a multiple of shorter object length

When I checked the RPKMs generated, the values are kinda of skewed. There are high variations even after the same conditions but the raw counts look fine.

Any idea? Thanks, Hang

RNA-Seq RPKM EdgeR • 3.6k views
ADD COMMENT
0
Entering edit mode

Thanks for your reply, guys. Do you know how to fix the problem?

Here are the scripts I used

d = DGEList(counts=counts, group=samples$condition)
d = calcNormFactors(d)
length.genes=read.table("gene_length_mouse.txt",sep="\t",header=T)
rpkm.gene=rpkm(d, length.genes[length.genes$Gene %in% rownames(d),2],normalized.lib.sizes=TRUE, log=F)

Thanks, Hang

ADD REPLY
0
Entering edit mode

Looks like d contains some rows not in your text file. Where you subset length.genes[] with the %in% command, you need to also subset d with the converse. You can only get RPKM for genes where you have the length in the text file. And for that matter, be extra careful the two lists are sorted the same! Maybe make them unified in another command and dont use the %in% command inside the rpkm function.

ADD REPLY
2
Entering edit mode
8.0 years ago
Shab86 ▴ 310

Could it be that you are providing different number of gene lengths than there are genes in your matrix?

ADD COMMENT
0
Entering edit mode

That's just what the error says. The "y" vector is a different length than the "gene.length.kb". It talks about multiples, because in R you're allowed to divide a vector of different lengths, because the shorter one is recycled, like (1,2,3,4,5,6) / (1,2) = (1/1,2/2,3/1,4/2,5/1,6/2)

ADD REPLY
0
Entering edit mode

Thanks for your reply, guys. Do you know how to fix the problem?

Here are the scripts I used d = DGEList(counts=counts, group=samples$condition) d = calcNormFactors(d) length.genes=read.table("gene_length_mouse.txt",sep="\t",header=T) rpkm.gene=rpkm(d, length.genes[length.genes$Gene %in% rownames(d),2],normalized.lib.sizes=TRUE, log=F)

ADD REPLY
0
Entering edit mode
8.0 years ago
Michael 54k

The problem is this:

length.genes$Gene %in% rownames(d)

and the file "gene_length_mouse.txt".

If some genes in the gene length file cannot be matched, then you end up with an unusable length vector. In addition I would like to stress that it might be better to have now gene lengths than bad ones. Gene lengths are rather a confounding factor, the combined exon lengths might be better to use here.

ADD COMMENT

Login before adding your answer.

Traffic: 2211 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6