Databases To Analyse Dna Characteristics/ Patterns
2
1
Entering edit mode
12.0 years ago
PoGibas 5.1k

I have collection of specific breakpoints and their flanking sequences

<------150bp-----/breakpoint/-----150bp----->

I want to find out if their is something in common in between those sequences (trying to compare & find some similarity).

I am interested in any database to decipher sequence patterns, characteristics (motifs, repeats, physical characteristics etc.)

I have already tried :

  • MEME motif search;

  • Vienna package for secondary structures;

  • Repeat masker;
  • Blast2 - for similarity around the breakpoint;
  • EMBOSS tools -UCSC tables for annotated info about repeats, histones, Dnase sites etc.

I am looking forward for any suggestions: TF binding, more secondary structures, more motifs, repeats and specific elements. Especialy protein (chromatin remodeling), nucleases , Ig target sites!

All suggestions are welcome - I am going to try them all.

dna motif • 2.5k views
ADD COMMENT
0
Entering edit mode

I used Vienna Package to get the energetic parameters of my DNA - secondary structures are cool (maybe way too good), but Vienna is always giving some kind of a structure and what I am interested more is a way to measure all possible DNA mechanical characteristics (flexing, bending etc.) Hope someone could help me with and easy way of doing it as Vienna was too "fancy".

ADD REPLY
2
Entering edit mode
12.0 years ago

Assuming you have enough examples, how about trying to set this up a classification problem?

If you're hunting for motifs, you can iterate all x- to y-mers in the flanking upstream and downstream features separately (where x and y are the min and max size of kmers you are looking for). These represent the features of your examples. (If you think other features are important, toss them in too).

Your breakpoint regions are your positive set. Pick an equally sized (or larger) set of breakpoints at random (or ones that look like your breakpoint in some way you think these should look but have no breakpoint) and this will be your negative set.

Run your data through some binary classifer (SVM, penalized logistic regression, boosting, etc.) doing appropriate cross validation and see if you can get good prediction accuracy. This will take some time to get right (assuming you can do so).

Once you can build a strong classifier, see if you can interrogate it to see which features are relevant.

ADD COMMENT
1
Entering edit mode
12.0 years ago

Have you had a look at protein-domain-analysis using HMMER?

You could transform your sequences into all 6 possible reading frames, then translate to amino acids and then check for protein domains. Domains are based on Hidden Markov Models so you might get different results than what you've already tried.

ADD COMMENT
2
Entering edit mode

I'm not sure he specified that these were breakpoints in coding regions(?)

ADD REPLY
0
Entering edit mode

True! This should only work on coding sequences.

ADD REPLY

Login before adding your answer.

Traffic: 1409 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6