Dearh Thanh Lan Chu,
the first thing i would do is checking if you can, in genereal, also see the peaks, which were identified by the Seqmonk in a data visualization program like IGV (integrated genome browser, this was used for the picture posted in this thread), the ucsc browser or something else.
If you can really see the identified peaks, you can trust them more than before.
The peak calling programs (MACS2, Peakzilla, HOMER), which i used for peakcalling, gave me the DNA sequence of the determined peaks. These were required to go on with the motif identification. So either you will have to figure out how to extract these sequences from Chipmonk or you should use a peakcaller, which runs on the command line (MACS2, Peakzilla, HOMER). In case you know how to use the command line you can either do this on your own, ask a bioinformatition for help or use Galaxy (https://usegalaxy.org/ ; a lot of programs are preinstalled here and can be used for bioinformatic analysis with nearly no bioinformatic skills).
You can go on with the MEME suit tool for motif identification. I know three ways of motif identification with the MEME suit tool. They all have drawbacks for large amount of peaks.
- The MEME tool itself. In case you try to find the motif within 100,000 sequences, it will probably take a few years since it gets extremely slow for large amount of sequences.
- DREME: It will find your motif also if you feed it with several hundred thousand peak sequences. It gives very clear motifs, which looks nice, but you might miss motif variants.
- Sort the peaks for the ones, which show the highest probability, take the top 10,000 and use MEME to search for the motif. This will give you the motif of the strongest peaks, but you might miss the overall situation.
MEME or DREME will give you the most likely motifs, which does not automatically mean that your protein binds these. You would need to do, for example, EMSAs (electro mobility shift assays) to find out if your protein really binds the identified sequence.
If you need to find peaks, which are located within annotated promoters, you can use bedtools intersect (http://bedtools.readthedocs.io/en/latest/content/tools/intersect.html), which is once again a command line tool and checks for overlaps between defined sequences (e.g promoter sequence and peak sequence). I guess there will also be an alternative or this program on Galaxy.
Before you try programming something yourself on R, try to check out if someone else did the job already somewhere else (Package for R, command line tools, Galaxy,....
People might argure, that your antibody pulls down sequences unspecifically. If you can, try to do the ChIP in a cell line without this protein (knocked down by CRISR/Cas9 or RNAi) to proof that the antibody pulls down only your protein specifically.
Hope this helps,
Best regards, Alex