Gene identifiers in rMATS output
1
0
Entering edit mode
6.8 years ago

Hi,

I am using output files produced from rMATS 3.2.5. We used the pre-built STAR index and the included hg19 GTF file for annotation.

Unfortunately, the gene identifiers are "weird," in the sense that they are a mixed bag of Uniprot, NCBI, and UC identifiers. A large number of these are impossible to convert using any conversion methods I can find (secondary identifiers, or simply unable to find even with Google).

Has anyone else run into a similar problem, and how can I fix it? Since it is close to 1000 genes, it will be impossible to go line-by-line and identify each gene using chromosome position...

Thanks!

RNA-Seq splicing alignment • 1.3k views
ADD COMMENT
0
Entering edit mode
6.6 years ago

You may have a mess around with DAVID's gene conversion tool (https://david.ncifcrf.gov/conversion.jsp), or try to alter this R code that I used to convert ENSEMBL IDs to RefSeq:

df <- read.table("ENSEMBL.IDs.tsv", header=TRUE, sep="\t", stringsAsFactors=FALSE)

library(biomaRt)

mart <- useMart("ENSEMBL_MART_ENSEMBL")

mart <- useDataset("hsapiens_gene_ensembl", mart)

annots <- getBM(mart=mart, attributes=c("ensembl_gene_id", "gene_biotype", "external_gene_name", "refseq_mrna", "refseq_ncrna"), filter="ensembl_gene_id", values=df[,3], uniqueRows=TRUE)

The IDs to convert are stored in the third column of my data-frame, ie., df[,3]

ADD COMMENT

Login before adding your answer.

Traffic: 2015 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6