downloading human gene symbol with duplicates
1
2
Entering edit mode
7.6 years ago
jiwpark00 ▴ 230

I've looked on Google and Biostars before but I can't quite seem to find this information.

I've tried both UCSC Table Browser and HUGN but both lists seem to have their own problems. I'm basically trying to:

  • Download a list of human gene symbols for all protein-coding genes
  • Along with their duplicate names

Thank you. I've seen posts for Ensembl and BioMart but can't seem to find the right link to do this.

gene symbol human genome • 2.6k views
ADD COMMENT
5
Entering edit mode
7.6 years ago
EagleEye 7.5k

ftp://ftp.ncbi.nih.gov/gene/DATA/GENE_INFO/Mammalia/

zcat Homo_sapiens.gene_info.gz | grep -w "protein-coding" | cut -f2,3,5,9,10 > output_table.txt

Sample output

1   A1BG    A1B|ABG|GAB|HYST2477    alpha-1-B glycoprotein  protein-coding
2   A2M A2MD|CPAMD5|FWP007|S863-7   alpha-2-macroglobulin   protein-coding
9   NAT1    AAC1|MNAT|NAT-1|NATI    N-acetyltransferase 1   protein-coding
10  NAT2    AAC2|NAT-2|PNAT N-acetyltransferase 2   protein-coding
12  SERPINA3    AACT|ACT|GIG24|GIG25    serpin family A member 3    protein-coding
13  AADAC   CES5A1|DAC  arylacetamide deacetylase   protein-coding
14  AAMP    -   angio associated migratory cell protein protein-coding
15  AANAT   DSPS|SNAT   aralkylamine N-acetyltransferase    protein-coding
16  AARS    CMT2N|EIEE29    alanyl-tRNA synthetase  protein-coding
18  ABAT    GABA-AT|GABAT|NPD009    4-aminobutyrate aminotransferase    protein-coding
  • information updated on daily basis
ADD COMMENT
0
Entering edit mode

Thank you, that's very helpful. I did a simialr with HUGO before and HUGO lists 19008 genes whereas NIH version gives 20731 genes. Is it because HUGO is "outdated"?

It seems like I'm getting different counts each time so that's why I was wondering.

ADD REPLY
0
Entering edit mode

Annotations change all the time, as knowledge about the genome is updated, so it would not be surprising to have a slight change in the number of genes from one release of the annotation files to the next.

The number of genes in the NCBI and HUGO lists may also differ because they each have their own annotation methods.

ADD REPLY
0
Entering edit mode

HUGO names are "official" names for human genes.

ADD REPLY
0
Entering edit mode

What do you mean "official" with quote? Are they better than NCBI list?

ADD REPLY
2
Entering edit mode

The HUGO Gene Nomenclature Committee is the only worldwide authority that assigns standardised nomenclature to human genes.

From this page. Also check the second question/answer.

ADD REPLY

Login before adding your answer.

Traffic: 1472 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6