This site is a beta test.
Forum: Survey: help define Gencode and NCBI primary transcripts
5
Entering edit mode
19 months ago
Emily_Ensembl 18k
EMBL-EBI

Ensembl and NCBI have been working to align the GENCODE and RefSeq reference transcripts. As part of that effort, we are also developing plans to define a primary transcript for every gene as well as a minimal set of clinically relevant transcripts. To guide that effort, we have developed a small survey to get input on how to define the primary transcript and whether this would be important to your work.

The survey should only take 10 minutes or less and you will have the opportunity to sign up for follow-up info about this project if you are interested.

https://goo.gl/forms/OjEXtYGt1pxcukqp1

Entering edit mode
0

We had ~1900 unique users on Biostars in the last hour. Surely more of you can find the time to complete the survey :-D

ADD REPLYlink 19 months ago
genomax
68k
Entering edit mode
0

Be fair, some of them work with proteins.

ADD REPLYlink 19 months ago
Emily_Ensembl
18k
Entering edit mode
0

One thing to consider is that wet lab scientists come to these tools to find sequences for their uses. It's already hard enough to reconcile the gene name reported in a paper (e.g. Hsc70) with the myriad of things with that name (e.g. the 20 or so HSPA8s) before you get to the transcripts.

For a wet lab scientist, a primary transcript could be a very nice thing to see but depending on how it is defined may be misleading or incorrect. Some people tend to think of it as a case where one transcript is "the right one", the one that is the wild-type one found in their cells/animals/etc, and the remaining transcripts are special cases in the sense of "if you needed that one, you'd know". This isn't correct from a bioinformatics standpoint, but in a larger scope it makes sense.

In some sense, a primary transcript (or is best defined by whatever transcript has historically been used experimentally or referenced in literature. Even if that transcript isn't the most abundant/contains some odd allele/etc, the most important information about that gene comes from these pubs and in particular the wet experiments. We may think that in bioinformatics we can just pick the longest/highest abundance/etc and be okay, but we often interpret the significance of our findings largely through what the literature tells us. If we cite papers that refer to transcript A to impart significance on our findings on transcript B, we're in trouble. The same goes for the wet lab biologist, if they clone in transcript B based on all the papers on transcript A, they're in trouble.

Not a new problem, but I'm wondering if identifying a primary transcript will, on average, worsen or improve this issue.

ADD REPLYlink 19 months ago
pld
4.8k
Entering edit mode
1

This is the logic behind considering this option. It's a bit of a Wild Wild West at the moment, with people picking the one transcript they're going to study by fairly arbitrary means, and don't always pick the same one. If an authority has defined this, at least it will solve one problem.

Also, people ask us for primary transcripts all the time.

ADD REPLYlink 19 months ago
Emily_Ensembl
18k
Entering edit mode
0

maybe relevant post here: How to tell which transcript is the canonical transcript?

ADD REPLYlink 19 months ago
steve
♦ 2.0k
5
Entering edit mode
19 months ago
Istvan Albert 80k
University Park, USA

This is one of those things where the reality and desired course of action are divergent and data service providers seem to need to choose between what people think they want versus the complex realities of science. Are life scientists the proper audience to "democratically" decide what "primary" means?

In my opinion, the term "primary" leads people to believe that a subset of the transcripts is more important than the others - they will study these more, hence becoming a self-fulfilling prophecy of 'importance'. It sets back science rather than promoting it.

I can't see the benefit of a new terminology for things that are already defined. Clinically relevant, longest exons, high abundance, low abundance we all know what these words mean. Whatever temporary benefit of a seemingly consistent naming pattern might be, the information will start changing the next day. And now we have to deal with those changes via a new and potentially misleading term. Why not just call one set "clinically relevant (as of 2018)", the other "high abundance" etc and let people filter by those.

The real challenges are in matching/summarizing one data release versus the other (or across versions), finding out what the differences are in between, visualizing them easily.

What we really need are accurate transcripts, ways to annotate or filter transcripts based on observed abundances in tissues or conditions. What we need is information that helps cut down on the busy patchwork of "custom" little scripts to figure out simple information.

ADD COMMENTlink 19 months ago Istvan Albert 80k
Entering edit mode
0

All this feedback is important, so please put it in the survey. The aim of the survey is to determine if we want to have a primary transcript, and if we do how we would define it.

ADD REPLYlink 19 months ago
Emily_Ensembl
18k
0
Entering edit mode
19 months ago
Emily_Ensembl 18k
EMBL-EBI

Adding an answer to bump up. If you care about this at all, please fill in the survey. We're never going to please everybody but if you fill in the survey at least your voice will be heard.

ADD COMMENTlink 19 months ago Emily_Ensembl 18k
0
Entering edit mode
19 months ago
Emily_Ensembl 18k
EMBL-EBI

We will close this survey at midnight (BST) on Thursday. If you wish to have your say, you've got two days to do it.

ADD COMMENTlink 19 months ago Emily_Ensembl 18k

Login before adding your answer.

Powered by the version 1.5.2