Biostar Beta. Not for public use.
How to select only protein coding mRNAs from a really long list of ENSG IDS
0
Entering edit mode
18 months ago

I have an R dataframe with a column of ENSG IDs.

I believe it contains non-protein coding IDs that I do not want

I only want to keep the rows that correspond to protein coding mRNAs

I am looking for a source of ENSG IDs (list or similar) that only contains IDs corresponding to protein coding mRNA.

I don't really need help with the coding, I just am looking for the data source.

The best thing I can think to do is scrape gencode's "Protein-coding transcript sequences" fasta, but there is hopefully a better way.

Thank you.

ADD COMMENTlink
1
Entering edit mode
19 months ago
h.mon 25k
Brazil

Use the biomaRt BioConductor package to query Ensembl directly. See Ensembl: Protein coding transcript ids for pointers.

ADD COMMENTlink
0
Entering edit mode
18 months ago

this is what I ended up with if anyone is curious:

library(biomaRt)

ensembl = useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")

example_ids = c("ENSG00000172927", "ENSG00000224713", "ENSG00000135269", "ENSG00000272555", "ENSG00000013588")


res <- getBM(attributes=c("ensembl_gene_id","gene_biotype"),filters = c("ensembl_gene_id","biotype"), values=list(example_ids,"protein_coding"), mart=ensembl)
res
ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1