The human reference genomes are just somebody's genome. What kind of errors and biases does this introduce?
Can you please direct me to relevant websites or papers?
The human reference genomes are just somebody's genome. What kind of errors and biases does this introduce?
Can you please direct me to relevant websites or papers?
Yes, GRCh37 / hg19, for example, is mostly based on a single individual from Buffalo, New York state. Take a look at my answer here, and that of Emily: A: Alternate nucleotide is more frequent than reference nucleotide. OMG I'm dizzy.
Note that GRCh38 / hg38 was constructed based on the data produced by the 1000 Genomes consortium; so, it is more 'wholly representative'. This said, I believe that the best procedure is to have multiple reference genomes that are tailoured for each ethnic region of the World.
Kevin
There is some info about the biological source of the human reference from the Genome Reference Consortium FAQs:
The human reference genome is a composite genome, derived from the sequence of several different anonymous individuals. Approximately 93% of the GRCh38 primary assembly (the assembled chromosomes, unlocalized and unplaced sequences) consists of sequences from 11 genomic clone libraries (a library can generally be considered a proxy for an individual’s genome). One of these libraries, RP11 or RPCI - 11 Human Male BAC Library has a much higher representation than all others, and contributes to 70% of the primary assembly. The donor of RP11 library was an anonymous male, though analysis suggests his DNA is an African-European admixture (see page 146 of Supporting Online Material of PMID:20448178). The remaining 7% represents sequences from >50 libraries. These libraries were developed from individuals (male and female), as well as flow-sorted chromosomes from various cell lines. The make-up of GRCh37 is largely similar to GRCh38 (Figure 1).
There are many additional facts about the reference sequence on that page.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thanks!!
Where can I find more information about how GRCh 38 was assembled?
I did google but no find.
Take a look at this great post by Giovanni: The New Grch38 Human Genome Browser Has Arrived!