Biostar Beta. Not for public use.
Annotate ouput file from Deseq2
0
Entering edit mode
18 months ago
BM • 40
United Kingdom

I am trying to annotate the results output file from Desq2 so it contains gene names and symbols. The RNA-seq count file I have used comes from Dexseq and contains ensembl transcript ID:

ENSMUSG00000000001:001

ENSMUSG00000000001:002

ENSMUSG00000000001:003

etc.

I have tried various methods to annotate the results.

1. downloaded annotation from Biomart.

library(DESeq2)

counts = read.delim("3mTA2.txt", header=T, row.names=1)

sample <- read.delim("~/sample.txt")

count.data.set <- DESeqDataSetFromMatrix(countData=counts, colData=sample,design= ~ genotype)

dds<-DESeq(count.data.set)

res <- results(dds)

annotation <- read.delim("mouse.annt.txt") # load annotation file from Biomart

res$EnsemblID <- row.names(res)

res <- merge(res, annotation, by = 'EnsemblID', all.x = TRUE)

It adds column to the output file but values are blank.

2. Also used AnnotationDbi

library("AnnotationDbi")

library("org.Mmu.eg.db")

res$symbol <- mapIds(org.Mmu.eg.db,

  • keys=row.names(res),

  • column="SYMBOL",

  • keytype="ENSEMBL",

  • multiVals="first")

Error in .testForValidKeys(x, keys, keytype) :

None of the keys entered are valid keys for 'ENSEMBL'. Please use the keys method to see a listing of valid arguments.

Any sugestions?

ADD COMMENTlink
0
Entering edit mode
15 months ago
James Ashmore ♦ 2.6k
UK/Edinburgh/MRC Centre for Regenerativ…

Those IDs you have listed are Ensembl gene IDs, not transcript IDs. I'm also not sure why they have the ':001' string after them? If you try the BioMart id conversion tool you can see that if you remove this last part and convert the ID to a gene name you get a result e.g. ENSMUSG00000000001 = GNAI3. This Ensembl tutorial may help you discern between the different IDs.

ADD COMMENTlink
0
Entering edit mode

ENSMUSG00000000001:001; ENSMUSG00000000001:002 - these refer to the the different exons of the gene.

So the question I suppose is how to combine or merge the different exon counts for the same gene into one count for the gene?

Can this be done in Dexseq or Deseq2?

ADD REPLYlink
0
Entering edit mode

You don't want to do that, since doing so will double count a number of things. Just run either htseq-count or featureCounts (this is much faster) and directly get gene level metrics.

ADD REPLYlink
0
Entering edit mode

The initial analysis was performed elsewhere. So I only have the Dexseq count file with ensemble ids of all the different exons of a gene. How can i use this file to proceed - either by annotating exons ids into a gene or using the file in Deseq2 and then annotate ?

ADD REPLYlink
0
Entering edit mode

That's unfortunate, particularly if you don't have the BAM or fastq files. Indeed, the best you can do is just remove the :E??? from the names, sum over the results and use that. Note that the results will then be approximate. You could do that with awk.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1