Biostar Beta. Not for public use.
NarrowPeak format of ChiP-seq
2
Entering edit mode
2.3 years ago
liu4gre • 200
United States

I just learn to understanding ENCODE ChiP-Seq data for Transcription Factor binding. I looked at the narrowpeak files and find there is a column named "Score". Is this the tag density indicating the binding affinity of TF at this site or region? If not, how can I get the tag density (or binding affinity)?

ADD COMMENTlink
6
Entering edit mode
15 months ago
Netherlands

ENCODE narrowPeak: Narrow (or Point-Source) Peaks format

This format is used to provide called peaks of signal enrichment based on pooled, normalized (interpreted) data. It is a BED6+4 format.

  1. chrom - Name of the chromosome (or contig, scaffold, etc.).
  2. chromStart - The starting position of the feature in the chromosome or scaffold. The first base in a chromosome is numbered 0.
  3. chromEnd - The ending position of the feature in the chromosome or scaffold. The _chromEnd_ base is not included in the display of the feature. For example, the first 100 bases of a chromosome are defined as _chromStart=0, chromEnd=100_ , and span the bases numbered 0-99.
  4. name - Name given to a region (preferably unique). Use '.' if no name is assigned.
  5. score - Indicates how dark the peak will be displayed in the browser (0-1000). If all scores were '0' when the data were submitted to the DCC, the DCC assigned scores 1-1000 based on signal value. Ideally the average signalValue per base spread is between 100-1000.
  6. strand - +/- to denote strand or orientation (whenever applicable). Use '.' if no orientation is assigned.
  7. signalValue - Measurement of overall (usually, average) enrichment for the region.
  8. pValue - Measurement of statistical significance (-log10). Use -1 if no pValue is assigned.
  9. qValue - Measurement of statistical significance using false discovery rate (-log10). Use -1 if no qValue is assigned.
  10. peak - Point-source called for this peak; 0-based offset from chromStart. Use -1 if no point-source called.

Here is an example of narrowPeak format:

track type=narrowPeak visibility=3 db=hg19 name="nPk" description="ENCODE narrowPeak Example"
browser position chr1:9356000-9365000
chr1    9356548 9356648 .       0       .       182     5.0945  -1  50
chr1    9358722 9358822 .       0       .       91      4.6052  -1  40
chr1    9361082 9361182 .       0       .       182     9.2103  -1  75

Source : https://genome.ucsc.edu/FAQ/FAQformat.html#format12

ADD COMMENTlink
0
Entering edit mode

Thanks. So does it mean the signalValue is the tag density? I looked through a few samples, and the values are always integer, is it true?

One more question is how to merge information from replicates? Apparently they always don't have the same regions. What kind of regions from replicates can be treated as the same region/site?

Thanks.

ADD REPLYlink
0
Entering edit mode

Hi, I somehow missed this. Yes, signalValue is the tag density.

For merging replicates, you can
1) Merge fastq files, if they are technical replicates (not the best)
2) Analyse seperately, and use bedtools intersectBed to find the overlapping regions, either on mapped bed files or significant binding sites (this is much better)
3) Calculate the tagDensity for a specific locus (TSS +/-3KB etc) and now you can compare both samples, as they have same locus, you can merge or average them, but dont forget to normalize by read or sequencing depth.

ADD REPLYlink
0
Entering edit mode

Thanks for replying. I come back to read your replying again, and have another question. Is it reasonable to calculate the binding difference between two TFs at the same position by subtract the signalValue of one TF from another one? Thanks.

ADD REPLYlink
0
Entering edit mode

yeah, its feasible. Better is to define a genomic locus and caluclate area under the curve normalized by the read depth and then compare.

ADD REPLYlink
0
Entering edit mode

Dear Moderator : I got question about the narrow peaks format. In general, BED file defined as chromName / chromStart / chromEnd / strand / Name /Score / ..., where score column refers to significance value of peak signal. However, I need to convert score column as p-value ( format of pvalue could be 1 base, 10 based, 100 based) . How can I achieve desired format of peak' p-value while add it as new metadata ? Could you give me possible idea please ? Thanks a lot :)

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1