Biostar Beta. Not for public use.
Missing Snp Variants In Two 10Mb Blocks On Chr1 And Chr9 In 1000 Genome Data?
0
Entering edit mode
2.4 years ago
Zhenyu Zhang • 240
United States

I recently want to use Mach to imputate some SNPs based on 1000 Genome data. Surprisingly, as I followed instruction to chunk ref genome into 10Mb blocks for easy handling, two blocks are of no SNPs at all, one on Chr1 130M-140M, the other on Chr9 50M-60M. So I went back to their allelic.Info file and found there are really big regions without any SNP data. My questions are

  1. Is it normal or something wrong with Mach's 1000G phaseI v3 file?
  2. What's special with these regions? difficult to sequence or too much variations?

Tthanks

ADD COMMENTlink
0
Entering edit mode
20 months ago
United States
  1. This is normal.
  2. They're centromeres, which are highly repetitive and therefore difficult to sequence cost-effectively. Genome Reference Consortium build 37, which is what the Mach file is based on, simply has large gaps representing the centromeres. Build 38 attempts to model them.
ADD COMMENTlink
0
Entering edit mode

Thanks a lot. This makes a lot sense.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1