Biostar Beta. Not for public use.
Bismark .cov coverage underrepresentation
0
Entering edit mode
20 months ago
Canada

Hello,

I have been using bismark for a while and I noticed that the .cov file which is produced by their "methylation extractor" add-ons, has underrepresented counts of methylated/unmethylated bases for many regions.

The two last columns of this file correponds to count of methylated, and count of unmethylated C. The software claims that the sum of these two columns is the coverage of the position.

I used bedtools genomecov (taking care that overlapping paired end reads are only counted once) and I found the while the bismark coverage highly correlates with bedtools', there are many instances bedtools having a higher coverage than bismark's.

I looked at many of these regions myself and found that the bedtools coverage is in fact correct.

In some of the cases, if I count only one of the strands, I get the same coverage as bismark but for many other examples I cannot understand what is the cause of this.

Anybody else has experienced a similar problem and might know where I might be wrong?

Thanks

Mehran

Answer:

Thanks to Devon Ryan, I found the reason for this. The C and MethylC counts must be specific to only one of the strands at each genomic position. For example if at positive strand is a C, then the minus strand is a G and only the C coverage should be counted. Bedtools gives the total coverage.

However it seems that there are some additional filterinngs by Bismark that I was not able to figure out.

ADD COMMENTlink
0
Entering edit mode

If you can post a small reproducible example then this can be confirmed at a bug. You might want to email Feli Krueger, the author. He's quite responsive.

ADD REPLYlink
0
Entering edit mode

I actually emailed Felix but he is on vacation for a while. I thought maybe other people had experienced something similar.

ADD REPLYlink
1
Entering edit mode

Too bad. If you can post a small example dataset that reproduces this then I can have a look. Having said that, I can also recommend just using PileOMeth, which I wrote along with https://www.biostars.org/u/36/. That won't give the same coverage metric as bedtools, but it's doing some additional quality filtering, so that's expected.

ADD REPLYlink
0
Entering edit mode

That is very kind of you. I am not able to find your email though. Could you please send me an email:

<removed>
ADD REPLYlink
1
Entering edit mode
20 months ago
Canada

Thanks to Devon Ryan, I found the reason for this. The C and MethylC counts must be specific to only one of the strands at each genomic position. For example if at positive strand is a C, then the minus strand is a G and only the C coverage should be counted. Bedtools gives the total coverage.

However it seems that there are some additional filterinngs by Bismark that I was not able to figure out.

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1