Prediction of TF binding sites at genome wide scale
1
0
Entering edit mode
7.8 years ago

I want to see the overall binding pattern of a TF (e.g. ARID3A) on the complete human genome (hg19). This task comprise of 2 steps:

1- Take human genome (hg19) fatsa and divide it into bins of 500 neucleotides. There will be two files, one containing the coordinates (as below) and other the whole genome fasta sequence

chrom   Start   End
 chr1   1       500
 chr1   500     1000
 chr1   1000    1500

2- Use one or different tools to identify binding sites of given TF in each of those bin, so tge final results I want is like:

chrom   Start   End   ARID3A
 chr1   1       500   binding
 chr1   500     1000  no-binding
 chr1   1000    1500  binding

If anybody has done something similar then kindly guide me how can I compartmentalize the genome into bin of size 500 and then by using which tools I can predict the binding sites which give me results at each bin level? Thank you.

ChIP-Seq TFBS Prediction • 1.7k views
ADD COMMENT
0
Entering edit mode

how are you going to handle motifs that span two of your segments?

ADD REPLY
0
Entering edit mode

A possible option could be to use sliding window of lets say 100 for segmenting the genome. In this case the sequences will be 1:500, 100:600, 200:700 and so on.. I think in that case I can overcome the issue you mentioned.

ADD REPLY
0
Entering edit mode
7.8 years ago

Well, I found answer to the first part:

Divide the human genome into windows of 500 neucleotides:

$ bedtools makewindows -g hg19.genome -w 500

Here the genome file contains the length of each chromosome in hg19, it is available here: https://github.com/arq5x/bedtools/blob/master/genomes/human.hg19.genome

ADD COMMENT

Login before adding your answer.

Traffic: 2643 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6