Biostar Beta. Not for public use.
Question: EnsGene.txt (UCSC/hg19) and Homo_sapiens.GRCh38.76.gtf (GRCH38) positions do not match
0
Entering edit mode

I understand that UCSC/hg19 positions are 0-based whereas GRCh38 positions are 1-indexed. However, when comparing feature positions on the hg19 ensGene.txt file with the same features on Homo_sapiens.GRCh38.76.gtf, the positions were completely off. For example, if you try picking any protein_coding transcript from the GRCh38 gtf file and compare its start/end positions, exon start/end positions, CDS positions, etc. with its positions on the ensGene.txt file, the positions are often off by a few thousand. I have also checked the gtf file in GRCh37 (which should be identical to hg19), but the positions were again way off. Can anyone explain why this is?

ADD COMMENTlink 5.4 years ago pwg46 • 370 • updated 5.3 years ago Biostar 20
Entering edit mode
3

hg19 == GRCh37

hg19 != GRCh38

ADD REPLYlink 5.4 years ago
Pierre Lindenbaum
120k
Entering edit mode
1

Given that you knew that hg19 is GRCh37 and not GRCh38, I'm confused why you're confused

ADD REPLYlink 5.4 years ago
Devon Ryan
90k
Entering edit mode
1

In the second part are you comparing GRCh37 to hg19 or GRCh37 to GRCh38? If the former I'm not sure why they would be off, if the later it is for the same reason as GRCh38 vs hg19..... GRCh38 is a completely different assembly of the reference human genome. Its size is different, the chromosomes are different, etc. You can always only compare coordinates within an assembly, not between assemblies.

ADD REPLYlink 5.4 years ago
Dan Gaston
7.1k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0