Repeatmasker - Identify Only One Type Of Repetitive Element
2
1
Entering edit mode
11.0 years ago
AndreiR ▴ 260

When running RepeatMasker locally, it runs for identify different types of repetitive elements. - identifying long interspersed repeats, tough LINE1s, Simple Repeats, ALUs , ancient repeats, retrovirus-like and so on . I understand that are -alu (Only masks Alus (and 7SLRNA, SVA and LTR5)(only for primate DNA) ) option, but and the others? There are alternatives for identify only one or other type of repetitive element?

repeatmasker • 4.3k views
ADD COMMENT
4
Entering edit mode
11.0 years ago
SES 8.6k

There are options to allow only masking interspersed repeats or simple repeats (listed below):

-nolow /-low
    Does not mask low_complexity DNA or simple repeats

-noint /-int
    Only masks low complex/simple repeats (no interspersed repeats)

-norna
    Does not mask small RNA (pseudo) genes

In addition, you can always create your own library of repeats and pass that to repeatmasker with the -lib option, which may also be faster if you have specific repeats you are interested in finding.

ADD COMMENT
0
Entering edit mode

Thanks! Think build my own lib is good for me.

ADD REPLY
0
Entering edit mode
11.0 years ago
AndreiR ▴ 260

As SES proposed, and Arian Smit suggested:

I managed this by creating my own lib. To do this I used queryRepeatDatabase.pl at util RepeatMasker directory.

>perl queryRepeatDatabase.pl -help

  queryRepeatDatabase.pl - 0.1 
NAME
    queryRepeatDatabase.pl - Query the RepeatMasker repeat database.

SYNOPSIS
      queryRepeatDatabase.pl [-version] [-species <species> |
                                         -stage <stage num> |
                                         -class <class> |
                                         -id <id>]
                                        [-stat]
                                        [-tree]
                                        [-clade]

DESCRIPTION
      A utility script to query the RepeatMasker repeat database.

    The options are:

    -version
        Displays the version of the program

    -species "species name"
        The full name ( case insensitive ) of the species you would like to
        search for in the database. This will return all the repeats which
        would be used in a RepeatMasker search against this species. This
        includes repeats contained in the clade given by "species name" and
        ancestral repeats of "species name". Lastly ubiquitous sequences
        such as RNAs and simple repeats are also included.

    -clade
        This will modify the default behaviour of the species option and
        return only the repeats which are specific to your species or any of
        it descendents. This is useful for identifying how rich the database
        of repeats is for a given species/clade.

    -stage <stage num>
        The number of the RepeatMasker stage for which you would like
        repeats. In the past these stages were individual libraries with the
        following general names:

          Stage          Library
          -----          -------
           0             species.lib
          10             is.lib
          15             rodspec.lib
          20             humspec.lib
          25             simple.lib
          30             at.lib
          35             sinecutlib
          40             shortcutlib
          45             cutlib
          50             shortlib
          55             longlib
          60             mirs.lib
          65             mir.lib
          70             retrovirus.lib
          75             l1.lib

    -class <class>
        Retrieve all elements of a particular class. For example:

          DNA
          SINE
          LINE
          LTR
          Other
          RC
          Satellite
          tRNA
          Simple_repeat
          Unknown
          snRNA

    -id <id>
        Retrieve only a single id from the database.

    -stat
        Returns statistics on the sequences

    -tree
        Prints the taxonomy tree for all species in the database.

SEE ALSO
    ReapeatMasker

COPYRIGHT
    Copyright 2005-2011 Robert Hubley, Institute for Systems Biology

AUTHOR
    Robert Hubley <rhubley@systemsbiology.org>

Optimizing -q and -pa options made it even faster :)

ADD COMMENT

Login before adding your answer.

Traffic: 2275 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6