Upsampling BAMs, or downsampling by A LOT?
0
0
Entering edit mode
6.1 years ago

I always downsample my ChIP-seq BAM files to the file with the lowest # of reads before I do any peak calling. My question is, what happens when you want to compare your data to publicly available data that has much, much lower coverage? I usually get about 60 million unique reads, and there's a dataset I'd really like to compare my data to (it's in a different cell line and I want to see if the distribution of peaks is different), but they only have about 17 million reads. I'm hesitant to downsample my own data by that much, but I imagine "upsampling" their data would only lead to a bunch of false positive data... Does anyone know what the convention is for this kind of problem?

Thanks in advance!

ChIP-Seq samtools bam downsampling upsampling • 1.7k views
ADD COMMENT
1
Entering edit mode

Comparisons over batch effects are problematic for a variety of reasons. What is the exact comparison you're trying to make? Hopefully you're not trying to use some published sample from someone else as a control for a comparison, that's recipe for problems.

ADD REPLY

Login before adding your answer.

Traffic: 2330 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6