Hi!
Is their an easier and simpler way (if their is ?) to compute the p-value and signal enrichment of the peaks present in broad peak file of macs2.
Thank you
Hi!
Is their an easier and simpler way (if their is ?) to compute the p-value and signal enrichment of the peaks present in broad peak file of macs2.
Thank you
According to Tao;
q-value is analogue to FDR. http://en.wikipedia.org/wiki/False_discovery_rate
Please also pay attention to the column name in MACS1 or 2 output. There are -logqvalue column or FDR(percentage) column.
Also. MACS1 and 2 use different methods to calculate FDR that I have explained in previous emails.
Also,
In MACSv1.4, FDR is calculated by swapping treatment and control. MACSv14 assumes all peaks called under a cutoff in this way are false positives. So it can calculate empirical FDR for each pvalue cutoff. However this method is hugely influenced by unbalanced sequencing depth. For example, if control sample is much larger than treatment, FDR would be overestimated so you would have less 'good' peaks above cutoff.
In MACSv2, I use another approach. First, pvalue are calculated at every basepair in the genome, then I adopt Benjamini-Hochberg to correct multiple comparisons, convert pvalue into qvalue or minimum FDR that the peak is significant. This method is more robust in my experience.
So, the interconversion will not give the same result as they are calculated differently plus these two tools are used for relatively different purposes and results are effected by the default values plus the read density of the samples in question. Best is to run the samples again with Macs14.
Sources
Update :
Just ran Macs14
and Macs2
, with same sample and control file
Macs2 [-q 0.01]
sed 1,24d test_peaks.xls | head
chr start end length abs_summit pileup -log10(pvalue) fold_enrichment -log10(qvalue)
chr1 4770143 4770617 475 4770441 22.00 10.06 5.39 6.92
chr1 4774970 4775332 363 4774970 21.00 4.59 2.67 2.17
chr1 4785399 4785693 295 4785544 39.00 18.79 6.27 15.11
chr1 4858164 4858580 417 4858395 28.00 11.08 4.76 7.86
chr1 5018564 5018786 223 5018683 22.00 12.43 7.12 9.11
chr1 6214341 6214589 249 6214361 22.00 8.33 4.35 5.36
chr1 7088437 7088737 301 7088545 41.00 18.89 5.97 15.20
chr1 7124854 7125079 226 7125006 16.00 6.05 4.08 3.37
chr1 7129919 7130171 253 7130091 21.00 11.95 7.13 8.66
Macs14 [FDR cutoff=1.5]
head ../fdrFilterPeaks_1.5_mock_xx.bed
chr start end length summit tags -10*log10(pvalue) fold_enrichment FDR(%)
chr1 4769731 4771497 1767 711 95 113.29 4.4 0.4
chr1 4773190 4775974 2785 1781 132 144.2 4.78 0.3
chr1 4776909 4778588 1680 1397 65 56.88 3.52 1.4
chr1 4784917 4786379 1463 629 93 115.63 6.72 0.38
chr1 4855079 4859029 3951 2726 257 78.67 3.04 0.71
chr1 4899973 4900844 872 476 32 71.88 6.35 0.88
chr1 4907855 4908681 827 421 29 62.32 4.76 1.16
chr1 5017615 5021374 3760 1071 170 315.75 6.97 0.23
chr1 5022679 5023934 1256 974 53 86.11 4 0.55
Though the cutoff vary a little, but still you can see, the size of peaks are really different implying the number of tags/pileup is different. If one would have to convert the file interchangeably, this would produce so much noise, that I wouldn't trust it in end. The physical location of peaks are also different, which is saying the protein is binding at relatively different places, which might/not be a significant problem.
By the way, why you would want to convert them.
Fisher test can be used to rank the peaks or by comparing two samples, you could tell few peaks appearing in a specific region are sample specific or universal or noise.
What you gonna do with the p-value after applying the fisher test to the tag density of sample over control after the Macs2 run. Also, how you gonna get the tagDensity of control file from Macs2 plus its already kind of calculated in the fold_enrichment
column.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thank you, Sukhi. But why not apply fisher test, to the broad peaks produced by MACS2 by counting the tags in treatment and control? It is much simpler and straight forward approach, isn't it?
I've just updated my answer!!
The format of broad peak output file of MACS2 is different and that is the problem.