Question

Gene identifiers in rMATS output

0

Entering edit mode

6.8 years ago

cjames.gblock • 0

Hi,

I am using output files produced from rMATS 3.2.5. We used the pre-built STAR index and the included hg19 GTF file for annotation.

Unfortunately, the gene identifiers are "weird," in the sense that they are a mixed bag of Uniprot, NCBI, and UC identifiers. A large number of these are impossible to convert using any conversion methods I can find (secondary identifiers, or simply unable to find even with Google).

Has anyone else run into a similar problem, and how can I fix it? Since it is close to 1000 genes, it will be impossible to go line-by-line and identify each gene using chromosome position...

Thanks!

RNA-Seq splicing alignment • 1.3k views

ADD COMMENT • link updated 6.6 years ago by Kevin Blighe 87k • written 6.8 years ago by cjames.gblock • 0

score 0 · Answer 1 · 2017-09-11

You may have a mess around with DAVID's gene conversion tool (https://david.ncifcrf.gov/conversion.jsp), or try to alter this R code that I used to convert ENSEMBL IDs to RefSeq:

df <- read.table("ENSEMBL.IDs.tsv", header=TRUE, sep="\t", stringsAsFactors=FALSE)

library(biomaRt)

mart <- useMart("ENSEMBL_MART_ENSEMBL")

mart <- useDataset("hsapiens_gene_ensembl", mart)

annots <- getBM(mart=mart, attributes=c("ensembl_gene_id", "gene_biotype", "external_gene_name", "refseq_mrna", "refseq_ncrna"), filter="ensembl_gene_id", values=df[,3], uniqueRows=TRUE)

The IDs to convert are stored in the third column of my data-frame, ie., df[,3]