Include values of zero in TSS metaplot?
1
0
Entering edit mode
6.0 years ago
goodez ▴ 640

I am making a custom python script that finds average coverage at each position surrounding TSS. The script takes in a bedgraph file and a gtf file.

My question is: do you think ignoring regions of zero coverage will adversely affect my TSS plot? For example:

distance from TSS | coverage
-------------------------|--------------
-3000 | 0
-3000 | 2.3
-3000 | 0
-2999 | 0
-2999 | 0
-2888 | 3.1
-2888 | 2.9
-2888 | 2.1
-2888 | 0

It may seem like a strange question, but I need to change my workflow in order to consider the values of zero coverage. So would the overall trend of the metaplot be different if I ignore the regions of zero?

Thanks

ChIP-Seq • 1.5k views
ADD COMMENT
1
Entering edit mode
6.0 years ago

the extent to which this will matter will depend on the details of how you calculate the average coverage (which you haven't supplied). I also don't immediately get what the numbers shown above are supposed to mean - is every row a different sample? have you tried it out, i.e. have you generated plots with and without the zeros?

as a side note: there are a number of established tools to do these kinds of calculations, including NGSplot (R-based) and deepTools (python-based)

ADD COMMENT
0
Entering edit mode

It is just the average for each position, so for position -3000 from TSS, my average would have been (0 + 2.3 + 0) / 3. Each row represents a different TSS (different gene). The coverage values are normalized per million. Sorry I realize it isn't totally clear but I did not want to go into the details of my python script.

I have been using an R package "ChiPseeker" which will give me a TSS plot but it does too much under the hood. I am doing my own script to have more control over the data. I'm going to redesign my workflow so I can use the zeroes.

Sorry for the weird question, I was just hoping to avoid including the zeroes.

ADD REPLY
0
Entering edit mode

I was just hoping to avoid including the zeroes

If you already decided to exclude zeroes, what is your question about?

I am doing my own script to have more control over the data

deepTools' computeMatrix lets you tune numerous parameters, including the handling of zeros (--skipZeros) and the type of calculation (mean, median, ..., --averageTypeBins). plus it's fairly fast and optimized, extensively tested and widely used.

ADD REPLY
0
Entering edit mode

It's not that I decided to exclude them. My question was about the effect of including or excluding zeroes. I have the zeroes now and the reason they were missing before was because I was using a tool which was converting my bam files into coverage files in a way that skipped regions of zero coverage.

I'll admit I was just being a bit lazy with not wanting to go back and change my workflow. Thanks for telling me about computeMatrix! I have used deepTools but not tried the computeMatrix function.

ADD REPLY

Login before adding your answer.

Traffic: 1787 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6