What's the name in RepeatMasker for mouse major satellite repeats?
2
1
Entering edit mode
7.1 years ago
biostart ▴ 370

Hello,

I want to get the coordinates of (at least some of the) mouse major satellite repeats. Trying to do this using UCSC repeat masker. It lists many types of repeats but obviously none of them has a name like major satellite repeats. This is a question about the specific nomenclature used in the UCSC Genome Browser. Could you please advise what's the closest repeat class name that needs to be selected to match for mouse major satellite repeats?

Thanks

sequence alignment RNA-Seq • 4.9k views
ADD COMMENT
1
Entering edit mode
7.1 years ago
Mike ▴ 60

I believe that "GSAT_MM" in RepeatMasker are supposed to be major satellites (see below) but if you look at the locations of all the GSAT_MM repeats they aren't all near centromeres. It's possible that the coordinates just aren't available for all the major satellite repeats: since major satellites are near centromeres, they'll be on the left on the UCSC Genome Browser (near the start/0 position on the chromosome) in mouse, but most of the sequences near that part of the chromosome aren't sequenced (there are 3 million N's at the start of each chromosome, turn on the "Gap" track in the Browser and you'll see it), probably because it's repetitive and hard to sequence/assemble. The GSAT_MM sequences that are in RepeatMasker but aren't near centromeres could be major-satellite-like sequences (ie. the RepeatMasker program identifies those sequences as similar to the canonical major satellite sequence) but aren't technically major satellites, given their position.

Regarding the GSAT_MM nomenclature: the G is for gamma, gamma satellites is another name for mouse major satellite repeats.

This seems to be the origin of the name:

"To distinguish this major mouse satellite DNA from other satellite families, such as the mouse “minor” satellite (Wong and Rattner, 1988), the human centromeric α satellite (Willard and Waye, 1987), and a new β-satellite family (H. Willard, personal communication), we propose to call it γ-satellite DNA."

http://www.sciencedirect.com/science/article/pii/0888754389900037

And these two more recent papers back this up:

"In the house mouse, Mus musculus, centromeric and pericentromeric regions are represented by two highly conserved, tandemly repeated sequences known as minor and major satellites (MiSat and MaSat, respectively, SATMIN and GSAT_MM in Repbase nomenclature)."

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3218096/

"Mouse satellites annotated as GSAT_MM in RepeatMasker (γ-satellite)..."

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4939917/

I'm not sure if some of the other classes/families are also major satellites. For reference, in mouse mm10 these are all the RepeatMasker repeat names classified as belonging to the "satellites" class and family:

  • CENSAT_MC
  • GSAT_MM
  • IMPB_01
  • MMSAT4
  • MurSAT1
  • SUBTEL_sa
  • SYNREP_MM
  • ZP3AR

And a few instances of (CATTC)n.

CENSAT_MC is "Centromeric satellite DNA" according to Repbase so that suggests it's also a major satellite, but it only shows up in RepeatMasker (mm10) 4 times and none of the coordinates are near centromers. I haven't checked all the other repeat classes/families in detail.

This paper may also be useful (http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-12-531), it discusses the presence (and absence) of major satellite repeats in the mouse genome assembly (though it's probably using mm9 given its publication date not mm10 so a little out of date). In table 3 they indicate that they could only find major satellites near centromeres on chromosomes 9 and 11. If you load up mm10 in the UCSC Genome Browser with the "Gap" and "RepeatMasker" tracks and look at the end (right side) of the 3 million bp centromere "N" gap, you'll see that only chromosomes 9 and 11 have satellite repeats annotated, so I would guess that those are the only two "official" major satellites annotated with actual positions in the mouse genome.

ADD COMMENT
0
Entering edit mode
7.1 years ago
Charles Yin ▴ 180

What do you mean the coordinates of the repeats? Do you mean the positions of repeats on genomes? You may check if the following paper is helpful (Yin, C. (2017). Identification of repeats in DNA sequences using nucleotide distribution uniformity. Journal of Theoretical Biology, 412, 138-145.; [http://www.sciencedirect.com/science/article/pii/S002251931630354X][1]).

ADD COMMENT
0
Entering edit mode

As it is written in the question above, I want to know which of the repeat names used in the UCSC repeat masker is most closely matching to mouse major satellite repeats.

ADD REPLY

Login before adding your answer.

Traffic: 2659 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6