Question

DESeq2 and integers values

1

Entering edit mode

5.1 years ago

Morris_Chair ▴ 350

Dear all,

I have an experiment with two different conditions, two treated and two untreated samples from where I performed RNA sequencing. here is the code that I'm using for the expression analysis

countdata <- ( as.matrix(read.table("/RNAseq/counts/DESeq2_STAR/count.table", header=TRUE,row.names=1))) 
condition <- factor(c(rep("treated",2), rep("Untreated",2)))
coldata <-data.frame(row.names=colnames(countdata),condition)
library("DESeq2")

everything is ok until I use DESeq2

    dds <- DESeqDataSetFromMatrix(countdata, coldata, design=~condition)

I can't find a way to make the integer number, and I don't know if I have to fix something with featureCounts or I can do this in R.

please help,

thank you

here the result

head(countdata)

mapping.star.Treated_1.bam mapping.star.Treated_2.bam mapping.star.Untreated_1.bam mapping.star.Untreated_2.bam
ENSG00000223972                       7.12                      58.49                       119.48                        27.45
ENSG00000227232                    2793.80                    2960.74                      2410.36                      1630.14
ENSG00000243485                      30.00                      16.32                         2.93                        20.52
ENSG00000237613                       0.00                       1.00                         0.00                         1.83
ENSG00000268020                       1.00                       0.00                         0.00                         0.50
ENSG00000240361                       0.00                       0.00                         0.00                         0.00

condition

[1] treated   treated   Untreated Untreated
Levels: treated Untreated

head(coldata)

condition
 mapping.star.Treated_1.bam     treated
 mapping.star.Treated_2.bam     treated
 mapping.star.Untreated_1.bam Untreated
 mapping.star.Untreated_2.bam Untreated

library("DESeq2")
dds <- DESeqDataSetFromMatrix(countdata, coldata, design=~condition)

**Error in DESeqDataSet(se, design = design, ignoreRank) : 
  some values in assay are not integers**

RNA-Seq DESeq2 • 17k views

ADD COMMENT • link 5.1 years ago by Morris_Chair ▴ 350

1

Entering edit mode

1) This is not a forum, please remove this tag.

2) Normally, you should have integers going into DESeq2. This is likely a featurecounts "issue". Please share how you generated your counts from featurecounts. Is this gene-level data?

ADD REPLY • link 5.1 years ago by lshepard ▴ 470

0

Entering edit mode

Hi Ishepard, I don't know what tag you mean, I'm sorry if I did something wrong.

Here it is what I used to generate the count file with featureCounts and yes, this is a gene-level data

featureCounts -M --fraction -g gene_id -T 4 -s 0 -a annotation/Homo_sapiens.GRCh37.75.gtf -o counts/T-ALL.star.featureCounts files.bam

thanks

ADD REPLY • link 5.1 years ago by Morris_Chair ▴ 350

4

Entering edit mode

Here is your problem: --fraction will generate fractional counts. featureCounts is not appropriate for counting multi-mapped reads, you shouldn't use -M --fraction. If you want to count multi-mapped reads, use Salmon, RSEM or something similar.

ADD REPLY • link 5.1 years ago by h.mon 35k

0

Entering edit mode

Ideally if I remove this command -M --fraction I should have integers value, I will loose some information but I can still go further with the analysis, isn't it?

thank you

ADD REPLY • link 5.1 years ago by Morris_Chair ▴ 350

1

Entering edit mode

Yes indeed.

But you won't loose any information, because featureCounts use the naive method of splitting the counts by the number of positions a read mapped. Salmon, kallisto and RSEM, on the other hand, use a expectation-maximization maximum likelihood algorithm, which will apportion multi-mapped reads based on the proportion of uniquely mapped reads mapping to the same feature.

Which is just a long way of stating the algorithm featureCounts use for counting multi-mappers is bad and should be avoided.

ADD REPLY • link 5.1 years ago by h.mon 35k

0

Entering edit mode

Or, use --fraction, but round your counts before giving them to DESeq.

ADD REPLY • link 5.1 years ago by swbarnes2 14k

0

Entering edit mode

Thank you to all for the precious advices, I got my first heatmap :)

Now I have to find a way to associate the gene name instead of the gene ID

ADD REPLY • link updated 5.1 years ago by ATpoint 82k • written 5.1 years ago by Morris_Chair ▴ 350

1

Entering edit mode

biomaRt package is a good annotation package as an fyi. You can add gene ID, descriptions, coordinates etc...

ADD REPLY • link 5.1 years ago by lshepard ▴ 470

score 2 · Answer 1 · 2019-03-07

2

Entering edit mode

5.1 years ago

h.mon 35k

The error message is very clear: the counts are not integers:

ENSG00000223972          7.12          58.49           119.48            27.45

DESeq2 expect integers, which is what you get from e.g. featureCounts. If you counted over transcripts then summarized over genes, you have to round up the counts, but the recommended approach is to use the tximport R/Bioconductor package to perform the transcripts to genes summarization.

If these values are TPMs, FPKMs, or the like, do not use them with DESeq2. DESeq2 expects raw counts as input.

ADD COMMENT • link 5.1 years ago by h.mon 35k

0

Entering edit mode

Hi h.mon, as far as I know bam files can't be TPM or FPKMs/RPKM because are not processed to be

thanks

ADD REPLY • link 5.1 years ago by Morris_Chair ▴ 350

0

Entering edit mode

Bam files can't be TPMs, FPKMs or counts. Bam files store sequence reads and their mapping positions to a reference genome (or transcriptome). Getting TPMs from bams is more or less the same as getting counts from bams: one has to iterate over the reads and record where they mapped in relation to a given annotation.

ADD REPLY • link 5.1 years ago by h.mon 35k