Question

Converting LogCPM to raw counts for DESEQ2

0

Entering edit mode

2.9 years ago

srhic ▴ 60

Hello,

A simple question but I am just asking to make sure what I am doing is not wrong. I a want to analyze a published RNA-Seq dataset. The count file uploaded by the authors contains logCPM values from edger. I understand that DESEQ2 expects raw counts so I was wondering if I can use this count table with it? Would just raising all values to the power 10 and multiplying by a million generate the raw counts or is this the wrong approach?

Thanks

rna-seq • 2.8k views

ADD COMMENT • link updated 11 weeks ago by Gordon Smyth ★ 7.0k • written 2.9 years ago by srhic ▴ 60

score 1 · Answer 1 · 2021-06-09

1

Entering edit mode

2.9 years ago

ATpoint 82k

That depends on how these logCPMs were calculated. Without exact code it is impossible to do I'd say (at least reliably). This is all quite hacky.

I would download the raw data (check sra-explorer.info for download links) and then quantify e.g. with a selective aligner such as salmon which is memory efficient and fast. https://combine-lab.github.io/salmon/getting_started/

ADD COMMENT • link 2.9 years ago by ATpoint 82k

0

Entering edit mode

Thanks, their methods section just says the logCPM values were generated with edgeR but no other info is available.

Downloading and processing the raw data was something I was trying to avoid but it seems that might be the only way. I wonder why people dont upload raw counts though.

ADD REPLY • link 2.9 years ago by srhic ▴ 60

1

Entering edit mode

There are so many ways to generate raw counts and, as methods + annotation improve, quantification also improves. Raw counts does not mean raw data. Always start from raw data if you have the option to do so.

You can also do analysis of public RNA-seq datasets via tools like: https://maayanlab.cloud/biojupies/

ADD REPLY • link 2.9 years ago by dsull ★ 5.8k

0

Entering edit mode

Thanks. That is such a nice tool. Never knew something like this existed. I was able to find my dataset and generate the differential expression table in five minutes. I feel I will be using this a lot!

ADD REPLY • link 2.9 years ago by srhic ▴ 60

score 1 · Answer 2 · 2021-06-09

1

Entering edit mode

2.9 years ago

Gordon Smyth ★ 7.0k

If the logCPM value were computed with edgeR, then that means they were obtained by

logCPM <- cpm(dge, log=TRUE)

If you had the edgeR normalized library sizes, which you probably don't, then you could convert back to counts.

Alternatively, you could simply analyse the logCPM values directly in limma, without converting back to counts.

ADD COMMENT • link 2.9 years ago by Gordon Smyth ★ 7.0k

0

Entering edit mode

How to plot mean variance trend using voom after removing batch effect using removeBatchEffect which returns log cpm values? The voom already transform counts to log cpm. I am confused?

ADD REPLY • link 11 weeks ago by Shaimaa Gamal ▴ 10

2

Entering edit mode

Please don't add comments to 3-year-old questions.

It has already been explained to you on the Bioconductor forum that you should not be adjusting counts before running voom. It is not right to batch correct before running voom. The batch correct should instead be done as part of the linear model. Since you're already using RUV, you have no need to use removeBatchEffect at all.

If you have more questions, please ask them on the Bioconductor forum. You have been getting extensive help on the Bioconductor forum and multiposting the same questions to multiple forums doesn't help any one.

ADD REPLY • link 11 weeks ago by Gordon Smyth ★ 7.0k