Question

TF motif binding region searching script in human genome

3

Entering edit mode

6.8 years ago

Shicheng Guo ★ 9.4k

Hi All,

I have one interest TF and its motif sequence is known (both logo and frequency matrix) and I want to identify all the related genomic regions. Is there any Perl, R or Python script to share? You can find that the perl script works perfect. therefore, I suggest you to use perl script.

blat don't works since sequence length is too short.

FYI:

motif logo (14bp) : TGGCACCATGCCAA

motif freqency matrix: `

A [ 0 0 0 0 14 2 2 7 4 0 0 0 16 14 ]

C [ 0 0 0 16 1 8 8 1 3 0 16 16 0 0 ]

G [ 0 16 16 0 0 5 4 5 2 16 0 0 0 1 ]

T [16 0 0 0 1 1 2 3 7 0 0 0 0 1 ]`

Logo:

enter image description here

motif logo • 1.5k views

ADD COMMENT • link 6.7 years ago by Shicheng Guo ★ 9.4k

2

Entering edit mode

seqkit locate -i -d -p TGGCACCATGCCAA <sequence.fa>/ <sequence.fa.gz>

i = ignore case d= degenerate base p = pattern

If you want to search only positive strand, use P. From 5th position, motif has degenerate bases.

ADD REPLY • link 6.8 years ago by cpad0112 21k

score 2 · Answer 1 · 2017-06-15

2

Entering edit mode

6.8 years ago

EagleEye 7.5k

Perl solution:

http://homer.ucsd.edu/homer/motif/index.html

Web/application solution:

http://meme-suite.org/tools/meme

ADD COMMENT • link 6.8 years ago by EagleEye 7.5k

1

Entering edit mode

Yes. It is exactly what I need. Thanks.

ADD REPLY • link 6.8 years ago by Shicheng Guo ★ 9.4k