scan DNA sequence for known motifs
1
0
Entering edit mode
5.4 years ago
2nelly ▴ 310

Hi all,

Can anybody suggest me a tool in which you can provide a DNA sequence in order to be scanned against known motifs?

Thank you in advance

sequence motif • 3.0k views
ADD COMMENT
0
Entering edit mode

A: Extracting the coordinates for a specific base from multifasta Instead of base, you can use pattern/motif while using seqkit.

ADD REPLY
0
Entering edit mode

This requires exact matching of the query to the subject, right? Not really suited for motif scanning where each position can have base substitutions while still belonging to the same motif, depending on the position frequency matrices.

enter image description here

ADD REPLY
0
Entering edit mode

Program supports regex and degenerate bases. An example data would have been better. @ 2nelly / Could you provide some example data?

something like this:

$ seqkit locate -ip AT[ATGCN]{2}ct test.txt 

seqID   patternName pattern strand  start   end matched
seq1    AT[ATGCN]{2}ct  AT[ATGCN]{2}ct  +   1   6   ATGCCT
seq2    AT[ATGCN]{2}ct  AT[ATGCN]{2}ct  +   1   6   ATCGCT
seq3    AT[ATGCN]{2}ct  AT[ATGCN]{2}ct  +   1   6   ATGGCT

$ cat test.txt 
>seq1
ATGCCT
>seq2
ATCGCT
>seq3
ATGGCT
>seq4
agggCT
ADD REPLY
0
Entering edit mode

Actually what I am interested in is to: 1) Provide a PWM like the above and scan for known motifs:

>MAT1   GATGACTCAG
A       0.1423  0.6657  0.1083  0.0637  0.703   0.08271 0.1059  0.05289 0.8572  0.07052 
C       0.1308  0.1516  0.02142 0.07233 0.1142  0.6778  0.07541 0.7873  0.0203  0.08761 
G       0.6295  0.1211  0.09723 0.8111  0.09712 0.1592  0.04718 0.1067  0.02145 0.7449  
T       0.09738 0.06157 0.7731  0.05288 0.08568 0.08028 0.7715  0.05307 0.1011  0.09697 
>MAT2   GAGTCATC
A       0.08561 0.8342  0.06805 0.07264 0.03543 0.8442  0.05354 0.06908 
C       0.09674 0.02945 0.09529 0.08935 0.8439  0.06511 0.1096  0.6998  
G       0.7666  0.06253 0.7815  0.07785 0.06999 0.01577 0.1205  0.09853 
T       0.05105 0.07379 0.05518 0.7602  0.05069 0.07487 0.7163  0.1326  
>MAT3   TGACTC
A       0.04545 0.02877 0.7506  0.04162 0.03697 0.03304 
C       0.01033 0.05893 0.08787 0.8451  0.04423 0.8338  
G       0.03614 0.8826  0.08604 0.06542 0.01263 0.07211 
T       0.9081  0.02965 0.07547 0.04788 0.9062  0.06101 
>MAT4   CTATGG
A       0.0336  0.08156 0.5596  0.06272 0.00974 0.05577 
C       0.8986  0.03644 0.175   0.0117  0.009955        0.03431 
G       0.03954 0.01639 0.1517  0.03361 0.9506  0.866   
T       0.02823 0.8656  0.1137  0.892   0.02967 0.04392 
>MAT5   GGTTCC
A       0.08517 0.08336 0.01438 0.008006        0.03285 0.06631 
C       0.01771 0.1123  0.114   0.07113 0.8397  0.8532  
G       0.8231  0.7282  0.0773  0.07392 0.05586 0.01382 
T       0.07397 0.07609 0.7943  0.8469  0.07159 0.06666

2) Give a consensus motif directly and look for possible matches to known motifs

ADD REPLY
0
Entering edit mode

Please edit your top-level question to really point out what you need. In the question you say "a DNA sequence" and now it is a PWM that you want to compare against a given set of other PWMs. Please be precise so that people can understand what you want.

ADD REPLY
0
Entering edit mode

Can anybody suggest me a tool in which you can provide a DNA sequence in order to be scanned against known motifs?

You asked this in the original question and got answers that fit that request.

Not describing the problem accurately leads to just noise in threads, which makes it hard to understand what is being asked and what answer goes with the actual request.

Please edit the original question and provide accurate complete information there.

ADD REPLY
1
Entering edit mode

For matters of completeness, if someone is interested in checking a single sequence against a collection of PFMs/PWMs, see my tutorial.

ADD REPLY
0
Entering edit mode

Dear ATpoint,

I already have the occurrences of some candidate motifs, came from thousands regions. What I am looking for is to check if this motif is found in a db like jaspar.

Thank you

ADD REPLY
0
Entering edit mode

Sorry genomax but I disagree,

The question is quite straightforward. My original goal was to provide a consensus DNA sequence to scan against known motifs, like i said as number 2 in my previous reply.

The number 1 came after the first reply.

So, if I get an answer to my original question, I will be happy. The second query was additional to the answers I received.

Adding comments like this, you increase the noise in threads for no reason. If you disagree, you could have sent me a dm.

Thank you

ADD REPLY
3
Entering edit mode

You are free to disagree, but if you manage to confuse a couple of our top contributors then you probably did not explain your question sufficiently well. Given that your original question is just one sentence suggests that you did not explain things enough. The fact that comments are going back and forward about what exactly you are looking for only confirm this.

And we don't have dm's on biostars.

My suggestion is that we abandon this thread, you think about what you want and clearly describe that in a new thread, providing a link to this thread for background information.

ADD REPLY
1
Entering edit mode

You guys keep adding noise to threads, by accusing me for not providing the correct query info.

Seems like it is easy to you to criticize people behind your keyboard instead of trying to get a second look at the question.

I think it s pointless to look for any solution in this thread.

P.S. The dm option would be a good idea, instead of destroying the questions of "some not so brilliant minds" like us.

Have a great day.

ADD REPLY
4
Entering edit mode

Hi,

Please do not be offended, none of the comments here are personal. Questions are expected to address the issue at hand in full, and be complete on creation. When information that foundationally changes a the problem at hand is added because contributors have to dig in and excavate the actual problem, discussions get out of hand really fast and the overall post plummets in value.

You're welcome to create another question and describe the problem with as much detail as you can, or I can clean up this thread and you can edit your question with the relevant details. What would you prefer?

ADD REPLY
0
Entering edit mode

There is a significant difference between your 'input data' being a PWM and an actual sequence. Nothing in your original question mentioned this matrix, so you got answers about detecting sequences.

Had you just stated the real format of your data in the first place, none of this 'noise' would be here. You didn't even use the word 'consensus' in your top level post, so how were we ever to know you had a PWM?

The onus is on you as the question asker, to make the task as simple as possible for the community to answer your question, it is not on us to guess/interpret what you meant all along. If we have to "take a second look at the question" at all, it wasn't a very well posed question.

ADD REPLY
3
Entering edit mode
5.4 years ago
Michael 54k

I am a bit puzzled about the fuzz in the comments, because the MEME suite has most likely everything you need with respect to motifs.

Or something like PWMscan for PWM's, or in R function matchPWM in package BioStrings.

Further options are in the EMBOSS suite, e.g. for searching Prosite patterns.

ADD COMMENT

Login before adding your answer.

Traffic: 2474 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6