ChIP-seq Peak Calling/File Format
2
0
Entering edit mode
5.0 years ago
lkalesin • 0

Hi all! I am trying to get ChIP-seq peaks from ENCODE ChIP-seq data. The particular experiment I am interested in is GSM613815. When I download the .bed files from GEO, however, I get a lines that look like this:

chr1 9859 10058 SOLEXA5_123:3:23:15452:1914

Unfortunately, this does not have scores, names, strands, etc according to the .bed file format, like so:

chr1 91852645 91853203 SRX005383.05_peak_1 612 . 17.40168 67.74557 61.27857 379

How would I use the information in the first file to get peaks I can use (second line)? Is it a conversion or do I have to do anything else?

ChIP-Seq encode roadmap epigenome • 966 views
ADD COMMENT
0
Entering edit mode
5.0 years ago
ATpoint 81k

I think what you have there is simply the sequencing reads in BED format, even though note that this is not standard BED because strand would need to be in column6 instead of column5. To make a proper BED file, do something like:

awk 'OFS="\t" {print $1, $2, $3, $4, ".", $5}' in.bed > out.bed

This file you could use to call peak e.g. with macs2 -t out.bed -f BED.

ADD COMMENT
0
Entering edit mode
5.0 years ago

I don't think you downloaded the peaks, as ATpoint mentioned, these are probably bed files of reads ("TagAlign"). The peaks from ENCODE are usually supplied in .narrowPeak files. Maybe try the roadmap website for downloading the peaks (it's subheader "C. peak calling", make sure to scroll down).

ADD COMMENT

Login before adding your answer.

Traffic: 2680 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6