This is my answer also to your previous question https://www.biostars.org/p/176736/.
At the end of the day, my way of assessing the quality of the ChIP and peak calling is to open ChIP and inputs in IGV and look at a few peaks taken from the high confidence to the lower confidence spectrum. Then check if they "look right". I know, it's subjective, it's not scalable. It sucks.
I was kind of writing a program (https://github.com/dariober/genomeGraphs) to at least simplify the scanning of peaks since scrolling through IGV can be painfully slow.