Question

Ncbi Gene Clone Name Information

0

Entering edit mode

11.3 years ago

Brian Tsai ▴ 100

Hi,

I'm trying to convert the gene name FLJ22536 into an entrez ID. I went to the NCBI gene database and found the entrez ID 401237, and found the name 'FLJ22536' under "Clone name" -- what does this mean?

http://www.ncbi.nlm.nih.gov/gene?term=FLJ22536

gene identifiers • 3.8k views

ADD COMMENT • link updated 10.8 years ago by Biostar 20 • written 11.3 years ago by Brian Tsai ▴ 100

score 1 · Answer 1 · 2013-01-13

This answer needs some technical understanding of old-fashioned molecular biology when the sequence data was extensively derived from "clones". It also helps to have some knowledge of libraries, inserts, reads (3' and 5'), vectors and more besides. The link to CloneDB above explains some of this for recent deposition schema but the primary record in this case http://www.ncbi.nlm.nih.gov/nuccore/AK026189 , precedes this by many years. In addition the details were often different between laboratories.

FLJ22536 (full length insert) is how this team designated the sequence reads off the machine. The clone from the library is HRC13155. The entry has the acession AK026189. If you parse the defintion line you get "Homo sapiens cDNA: FLJ22536 fis, clone HRC13155" but this is not a gene name. Note also the entry is ORF-less (no CDS) is in the mRNA division and this team also overlapped this sequence with a read from different clone and different library (AK022865 FLJ12803 fis, clone NT2RP2002172).

Confusion arises because of the long chain of secondary annotations from this entry. At some point someone decided that this was not a cloning artifact but a long intergenic non-protein coding RNA (http://en.wikipedia.org/wiki/Longnon-codingRNA) My first guess for the transitive annotation sequence was Vega (RP1-67M12.1) > Ensembl > HGNC > Entrez Gene (EG). Importantly, this early classification as a "gene" was passed on to HGNC and Entrez. I would deffer to Vega curators of course but am still circumspect about the "gene" definition. I'm also confused as to the where the RefSeq fits in the five or six-link inference chain

http://www.ncbi.nlm.nih.gov/nucleotide/212275242?report=genbank&log$=nuclalign&blast_rank=3&RID=F5H0F9DC014

The RefSeqN system, from the starting point of processing AK026189 appears to have merged this with a longer genomic prediction, computationly added 12 non-codinig exons into the record and the name "RNA 340 (LINC00340)" . This has become its EG name, but may have propagated "backwards" to HGNC

If any one can clarify more details for this example of annotation spaghetti, please pitch in

score 0 · Answer 2 · 2013-01-13

A bit of clicking around the site suggests these might answer your question:

http://www.ncbi.nlm.nih.gov/books/NBK3841/#EntrezGene.General_Gene_Information http://www.ncbi.nlm.nih.gov/clone/content/faq/#clonenomenc

I found these using Google ("entrez gene clone name") and clicking the "help" ("?") links on the page you put in your post. I don't mean to be sarky - you'd be amazed how many questions like this can be solved with judicious use of search engines, links to help pages, or even just searching web pages for other instances of your text of interest.

Out of curiosity, I wonder why you are interested in knowing this? Typically, in a data resource like this (or something like UniProt, that I personally use more often than EntrezGene), there are many (many...) different pieces of information associated with your entity of interest.

At first, when confronted with this huge amount of information, there is a tendency to worry about all the many things amongst it that we do not understand/know anything about (this is my experience teaching introductory bioinformatics to bench biologists, at least, and indeed mirrors my own initial experience working with bioinformatics data).

As time goes on, however, and you feel more comfortable working with these resources, you will probably tend to focus much more on those things that are specifically useful for addressing your specific questions of interest; I expect that, on average, this tends to lead more quickly to answers to these questions, which is often what you'd prefer.

Thus, these days, when I see, for example, a field like this in a database for an entity I'm interested in, and don't know what it means, but I can get what I want from the file without spending time trying to understand what this does mean, then I, without any feelings of guilt, just jump over it. Obviously, if this information is important for your analysis, then this can cause problems; but hopefully these problems will manifest themselves to you in a way that you notice them, in which place you can then go back to your data and try and work out what the problem is, maybe at this point working out what these extra data fields mean.

I'm not saying "don't be interested in stuff that's not apparently directly relevant to your question" - just that often there is soooooo much "stuff" out there to read/understand, that it pays to focus instead on the more relevant stuff.

Sorry if this comes across as patronising - it's just I got the sense from your question that you might be new to this kind of thing, and the above comment/advice is something I think it might have helped me to read when I was starting out.