how to find consensus sequence for repeats
1
0
Entering edit mode
8.1 years ago
dli ▴ 250

Hi,

I downloaded the rmsk.txt from UCSC genome browser, http://genome.ucsc.edu/cgi-bin/hgTables?db=hg38&hgta_group=rep&hgta_track=rmsk&hgta_table=rmsk&hgta_doSchema=describe+table+schema,

I got following stuff:

bin swScore milliDiv    milliDel    milliIns    genoName    genoStart   genoEnd genoLeft    strand  repName repClass    repFamily   repStart    repEnd  repLeft id
585 463 13  6   17  chr1    10000   10468   -248945954  +   (TAACCC)n   Simple_repeat   Simple_repeat   1   471 0   1
585 3612    114 215 13  chr1    10468   11447   -248944975  -   TAR1    Satellite   telo    -399    1712    483 2
585 484 251 132 0   chr1    11504   11675   -248944747  -   L1MC5a  LINE    L1  -2382   395 199 3
585 239 294 19  10  chr1    11677   11780   -248944642  -   MER5B   DNA hAT-Charlie -74 104 1   4
585 318 230 37  0   chr1    15264   15355   -248941067  -   MIR3    SINE    MIR -119    143 49  5
585 18  232 0   19  chr1    15797   15849   -248940573  +   (TGCTCC)n   Simple_repeat   Simple_repeat   1   52  0   6
585 18  137 0   0   chr1    16712   16744   -248939678  +   (TGG)n  Simple_repeat   Simple_repeat   1   32  0   7
585 239 338 129 0   chr1    18906   19048   -248937374  +   L2a LINE    L2  2942    3104    -322    8
585 994 312 60  25  chr1    19971   20405   -248936017  +   L3  LINE    CR1 2680    3129    -970    9
585 270 331 7   27  chr1    20530   20679   -248935743  +   Plat_L3 LINE    CR1 2802    2947    -639    1

For example, the repeat name L1MC5a, if I want to get the sequence of this repeat, should I found from RepBase? But I could not find it from Repbase: http://www.girinst.org/repbase/update/browse.php?type=All&format=EMBL&autonomous=on&nonautonomous=on&simple=on&division=Homo+sapiens&letter=L

Aanyone has suggestions on how to fix this? Thanks a lot in advance.

genome repeat • 2.5k views
ADD COMMENT
1
Entering edit mode
8.1 years ago
GenoMax 141k

See if table browser @UCSC works.
Select human(?) genome --> Group (Repeats) --> Track (Repeatmasker) --> Region (whole genome/region?) --> Output format (Sequence) --> Give a file name to save the data to file.

ADD COMMENT
0
Entering edit mode

Thanks for you reply @genomax2.

I am not actually looking for genomic sequence for copies, I am looking for consensus sequences.

ADD REPLY
0
Entering edit mode

I don't think there is a consensus. The repeats by their nature will have difference. e.g. If I restrict to the output to L2a LINE repeat I get this summary from the table browser

item count  174,058
item bases  43,353,715 (1.42%)
item total  43,373,283 (1.42%)
smallest item   11
average item    249
biggest item    3,283
ADD REPLY

Login before adding your answer.

Traffic: 2560 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6