Data changes between reference assemblies
2
0
Entering edit mode
5.3 years ago
first ▴ 30

Was there a change in the reference allele and/or alternate allele for a snp which occurred when a new reference assembly was released or even between builds (e.g. the reference allele in the previous assembly is now the alternative allele in the new assembly)? I am asking specifically for changes from GRCh37 to GRCh38.

genome assembly • 980 views
ADD COMMENT
4
Entering edit mode
5.3 years ago
Emily 23k

A specific thing they looked to do for GRCh38 was identify variants where the reference base was a rare or private allele. Since the genome is made up contigs, all of which are sequenced from real people's genomes and those people have rare or private alleles in their genomes, these will occur in the genome. There are a lot of these in GRCh37 so the GRC sought to fix this. They did this by looking at 1000 Genomes allele frequencies to identify variants, then found short contigs to cover the region where the variant occurs.

Taking Jean-Karim's example variant, here is the genomic region around the variant in GRCh37 and GRCh38. The contig is shown as a bar in the middle of the image in alternating shades of blue. You can see that in GRCh37 the whole region is covered by a contig called AC069356.6, whereas in GRCh38, this same contig occurs but it is split, with a very short contig, KF459701.1, in the middle of it, covering the variant rs4940595.

There are still rare and private alleles in GRCh38, but there are fewer than in GRCh37, and this is why.

ADD COMMENT
2
Entering edit mode
5.3 years ago

GRCh37 and GRCh38 are different sequences so you can't expect reference alleles to be the same.
Example: SNP rs4940595 in GRCh37 and in GRCh38

ADD COMMENT

Login before adding your answer.

Traffic: 2976 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6