Gene Position Info From Encode Project
1
1
Entering edit mode
11.3 years ago
J.F.Jiang ▴ 910

Hi All,

I am working on SNP annotation which needs a gene annotation file containing the postition infomation of the gene region.

Previously people always use the file from UCSC hg18 refGene to extract the info.

Here, as mentioned in many papers, ENCODE V7 gene is a better resources for the gene annotation.

In the website:

http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeGencodeV7/ several links have been offered, of which the wgEncodeGencodeBasicV7 should be the one that I look for?

wgEncodeGencodeBasicV7.gp.gz project=wgEncode; grant=Hubbard; lab=Sanger; composite=wgEncodeGencodeV7; dataType=Gencode; dccAccession=wgEncodeEH001881; dateSubmitted=2011-05-01; subId=4347; labExpId=V7; labVersion=Basic Gene Annotation Set; tableName=wgEncodeGencodeBasicV7; type=gp; md5sum=ee1cdaa985ca47337dff5efb0cafb3ed; size=4.5M

BTW:

There are also other annotation levels in the website:

  wgEncodeGencodeV4/         05-Jul-2012 06:57    -   
  wgEncodeGencodeV7/         05-Jul-2012 06:57    -   
  wgEncodeGencodeV10/        22-Feb-2012 13:40    -   
  wgEncodeGencodeV11/        08-Mar-2012 09:44    -   
  wgEncodeGencodeV12/        20-Jun-2012 16:40    -

So what's the difference between these datasets? Since almost all the publications only considered the V7 set

Thanks anyway!

Best

<h6>#########Add in 1/12</h6>

Sorry, forget to ask this: In the data file, many genes may have different transcripts which may have different txStart positions So, basically, how to define the gene region?

gene annotation encode position • 2.0k views
ADD COMMENT
2
Entering edit mode
11.3 years ago
PoGibas 5.1k

Current release is Gencode v14.
You should also check statistics to see how each version differs from the others.

Hope this helps.

UPDATE:
You can try gencode.v14.annotation.gtf.gz. It has tracks for various gene types: protein coding, lncRNA, mirRNA, pseudogenes etc.
If, for example, you're interested in protein coding genes you can extract coordinates for all gene region or only it's exons, transcripts, UTR's (usually the defined gene region is the same or larger than it's transcripts).

ADD COMMENT
0
Entering edit mode

Yes, thanks, that is really helpful

However, which kind of criteria is to choose the version of the datasets? Could you give me some insights?

ADD REPLY
0
Entering edit mode

The annotations changes frequently. You want to pick the one which is stable and freeze it. Then you can do all the analysis with that version.

ADD REPLY
0
Entering edit mode

I updated my previous answer.

ADD REPLY

Login before adding your answer.

Traffic: 2735 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6