Protein coding annotation from UCSC
2
0
Entering edit mode
4.9 years ago

How do i download only protein-coding genes from UCSC table browser? I've chosen gencode v19 and genes and gene prediction in settings bars, but i'm getting in the result much more genes, than just protein coding

I'm attaching here screenshots with my settings image

and the head of resulting dataframe

image

ucsc annotation • 1.5k views
ADD COMMENT
0
Entering edit mode

You seem to have pasted the link for image hosting site twice. Please go under "Embed codes" tab and use the full image html link and paste here.

ADD REPLY
0
Entering edit mode

yes, sry, fixed it in the original post - now the second picture is the right one

ADD REPLY
2
Entering edit mode
4.9 years ago
lshepard ▴ 470

One way is to choose "selected fields from primary and related tables", click the linked table and "allow selection from checked tables", selects "geneType/BioType of gene" (and anything else relevant to you), save the results and then subset/filter your file to contain rows which match protein_coding (a simple grep could do this).

I am sure there is more than one solution to the above, but this would work.

ADD COMMENT
0
Entering edit mode

Thank you, that does it!

ADD REPLY
0
Entering edit mode
4.9 years ago
Luis Nassar ▴ 650

Hello Tim,

With your same selections on the Table Browser, if you change the output format to BED, you will see additional options to refine the output. The following page will say:

Create one BED record per:

Which by default makes an entry for the Whole Gene. You can instead designate just Exons, or only Coding Exons to exclude 3'/5' UTRs. The output will be in BED format (https://genome.ucsc.edu/FAQ/FAQformat.html#format1).

There are many answers to this question depending on exactly what you are looking for. If you have additional questions I would encourage you to look at our mailing list archives (https://groups.google.com/a/soe.ucsc.edu/forum/#!forum/genome), or write in directly to the help desk (genome@soe.ucsc.edu).

p.s. If you are looking for a more concise gene list, you may use the UCSC Genes track for hg19, then select the knownCanonical table. This data table only has a single isoform for each gene.

ADD COMMENT

Login before adding your answer.

Traffic: 1699 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6