Question

ChIP-seq Peak Calling/File Format

0

Entering edit mode

5.0 years ago

lkalesin • 0

Hi all! I am trying to get ChIP-seq peaks from ENCODE ChIP-seq data. The particular experiment I am interested in is GSM613815. When I download the .bed files from GEO, however, I get a lines that look like this:

chr1 9859 10058 SOLEXA5_123:3:23:15452:1914

Unfortunately, this does not have scores, names, strands, etc according to the .bed file format, like so:

chr1 91852645 91853203 SRX005383.05_peak_1 612 . 17.40168 67.74557 61.27857 379

How would I use the information in the first file to get peaks I can use (second line)? Is it a conversion or do I have to do anything else?

ChIP-Seq encode roadmap epigenome • 966 views

ADD COMMENT • link updated 5.0 years ago by Friederike 8.9k • written 5.0 years ago by lkalesin • 0

score 0 · Answer 1 · 2019-04-19

I think what you have there is simply the sequencing reads in BED format, even though note that this is not standard BED because strand would need to be in column6 instead of column5. To make a proper BED file, do something like:

awk 'OFS="\t" {print $1, $2, $3, $4, ".", $5}' in.bed > out.bed

This file you could use to call peak e.g. with macs2 -t out.bed -f BED.

score 0 · Answer 2 · 2019-04-19

0

Entering edit mode

5.0 years ago

Friederike 8.9k

I don't think you downloaded the peaks, as ATpoint mentioned, these are probably bed files of reads ("TagAlign"). The peaks from ENCODE are usually supplied in .narrowPeak files. Maybe try the roadmap website for downloading the peaks (it's subheader "C. peak calling", make sure to scroll down).

ADD COMMENT • link 5.0 years ago by Friederike 8.9k