How to obtain peak height values from a bed file?
0
0
Entering edit mode
7.7 years ago
morovatunc ▴ 550

Dear all hi,

I would like a ignorant question about chip-seq bed files. My problem is even though I have been dealing with bed files for couple months, I have realised that they are the processed bed files and I would like to know if there is a gold standard process methods to get processed beds. Most of the chip-seq papers briefly explains such as we used macs14 to process chip-seq fastq files and nothing more. So I felt a little bit misguided.

This is one of the head of BED files that I consider as ok quality.

chr1    644044  644249  peak1   8.05838
chr1    831530  831681  peak1   3.61544
chr1    900849  901108  peak2   6.77098
chr1    931535  931798  peak3   5.76549
chr1    960454  960776  peak4   7.79623
chr1    967782  967928  peak5   3.42912
chr1    967933  968142  peak1   8.01545
chr1    1015395 1015544 peak2   6.16261
chr1    1062523 1062669 peak6   4.58519
chr1    1114526 1114795 peak3   7.18694
chr1    1133651 1133837 peak7   4.70157
chr1    1133974 1134157 peak8   6.35043
chr1    1225207 1225360 peak4   3.65434
chr1    1233033 1233258 peak9   10.23716
chr1    1240866 1241022 peak5   4.38862
chr1    1269218 1269485 peak10  8.46963

This is one of the bed files that I want to learn how to process it. I guess these are reads and we can use bed merge to count them ?!?

==> sorted_GSM1442789_mock_p300.bed <==
chr1    10175   10225   SN608VA04562315268.70401.50 255 +
chr1    10238   10288   SN608VA04562307401.904854.30    255 +
chr1    17461   17511   SN608VA04552313922.002700.10    255 +
chr1    17470   17520   SN608VA04551207701.404148.50    255 +
chr1    87067   87117   SN608VA045511021679.002497.60   255 +
chr1    100632  100682  SN608VA04551216367.905873.00    255 +
chr1    150554  150604  SN608VA04552305857.108801.20    255 +
chr1    532437  532487  SN608VA04562204347.808268.70    255 +
chr1    533139  533189  SN608VA04552209399.109424.70    255 +
chr1    533139  533189  SN608VA045512071736.301060.60   255 +

Also, I have one more bed file sample that I have used merge as it was suggested in this post. I have applied bed merge -d (some base pair) to make it "concentrated" then, eliminated all the peaks have more than 100 counts but I would be more than glad if you can point me out gold standards or statistical ways to process this data.

chr1    7325    7361    r_1 1   -
chr1    7334    7370    r_2 2   -
chr1    90496   90532   r_3 1   -
chr1    523003  523039  r_4 2   +
chr1    554319  554355  r_5 1   -
chr1    554321  554357  r_6 10  -
chr1    554322  554358  r_7 2   -
chr1    554323  554359  r_8 11  +
chr1    554323  554359  r_9 2   -
chr1    554324  554360  r_10    19  -

Please teach me how to catch a fish ! :)

Best,

Tunc.

bedtools chip-seq • 2.3k views
ADD COMMENT
0
Entering edit mode

BED is a pretty generic format. Can you describe what your actual goal is? There's no single way to process stuff like this (and I would never even create the second one you showed).

ADD REPLY
0
Entering edit mode

My aim is to compare the binding of a transcriptional factor across different tissues. Later on I will annotate those regions based on their functions.

1) But right now, my bed file sizes are very high (~500 MB) because of the peak widths are <50 bp and low peak height. So I think I should merge them with bedtools merge -d and filter out low height peaks. I need some publications that have studied this kind of filtration or enrichment to loose the noise. (I have to prove my PI that I am doing this based on a logic/previous publication.)

2) I did not understand why the second type of bed file existed. My best guess about that bed type is, they are just the locations of the reads so I wanted to validate my guess.

Thank you for your help,

Tunc.

ADD REPLY

Login before adding your answer.

Traffic: 2346 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6