Question

Is there a "default" RefSeq transcript for genes?

0

Entering edit mode

4.9 years ago

lumal29 ▴ 80

I'm working on cancer genes and I have a question about the RefSeq transcripts.

As you may know, a gene can have several transcripts due to alternate splicing. I have seen on the Pecan Saint-Jude website that for genes with several transcripts, they have a representation of the "default" transcript, but I don't understand how they choose that. When you go on NCBI, they is no obvious way to tell if one transcript is better or more common than another one.

Do you guys know if there is a way to determine the "default" RefSeq transcript if anyhow it exists?

Thank you

RefSeq NCBI Transcript mRNA • 1.3k views

ADD COMMENT • link updated 4.5 years ago by vkkodali_ncbi ★ 3.7k • written 4.9 years ago by lumal29 ▴ 80

3

Entering edit mode

MANE is a new joint project from NCBI/EMBL-EBI to address this specific question. A beta version of data is now available.

We’re leveraging public deep sequencing datasets to optimize 5’ and 3’ UTR endpoints to more accurately reflect transcriptional processes. To pick representative transcripts, we’ve developed computational methods to evaluate and integrate transcript expression levels, protein conservation, support from archived transcript submissions, clinical relevance, and other factors. Complex genes are subject to review by annotation experts from both groups to agree on a representative transcript and often make improvements to both annotation sets.

Assembly and maintenance process for RefSeq records was published in this handbook in section "how data are assembled and maintained".

ADD REPLY • link 4.9 years ago by GenoMax 141k

0

Entering edit mode

Thank you very much, very interesting project!

ADD REPLY • link 4.9 years ago by lumal29 ▴ 80

score 0 · Answer 1 · 2019-10-23

I am assuming you are interested in RefSeq human annotation. If so, I suggest you to take a look at the RefSeq Select and the related MANE Select projects. From the RefSeq Select help page:

The RefSeq Select dataset consists of a representative or “Select” transcript for every protein-coding gene.