How Much Is Too Much For 5 Prime End Methylation Rate In Rrbs Data ?
1
4
Entering edit mode
10.1 years ago
samsara ▴ 630

I have RRBS fastq files. I used Bismark to perform methylation call. After methylation call I got M-bias plot shown below. The methylation rate of first three bases of 5 prime end is quite high. The actual methylation count and rate of first four position is shown below.

My questions are:

  1. Is the observed high methylation rate is because of end repair biases?
  2. In the literature It has been mentioned that it is common to have high methylation rate in 5' end, but how much is too much?
  3. First three bases of RRBS reads are either CGG or TGG depending on their methylation state. Is it good idea to chop off first 3 bases ? If yes, doesn't the removal of C (that retains original genomic methylation state) influence downstream analysis?
CpG context
===========
position    count methylated    count unmethylated    % methylation    coverage
1    5000734    2489532    66.76    7490266
2    430    206    67.61    636
3    190    131    59.19    321
4    34174    79253    30.13    113427

enter image description here

next-gen • 3.8k views
ADD COMMENT
4
Entering edit mode
10.1 years ago
  1. Yes, the first 3 bases or so are likely due to end-repair. Alternatively, this could also be due to incorrect trimming if you didn't trim correctly (trim_galore is good for this and this case is mentioned in the bismark user guide).
  2. There's no objective answer to this. With Bison, the methylation bias tools will suggest ignoring regions according to a p-value derived from the likelihood of observing that extreme (or more) of a deviation from the methylation profile of the middle of the reads (with a minimum percentage difference, which I have default to 1%). That's similar to what the BSeqQC package does.
  3. Yeah, anytime you have a skewed graph like this you should ignore (or remove, depending on the tools) those regions. It's unfortunately the case that in RRBS this may remove a large portion of the methylation calls.
ADD COMMENT
0
Entering edit mode

For RRBS, the majority of the reads start with CGG or TGG (at the 5'), and that's the MspI cutting sites left-over. For the M-bias plot, it plots methylation% in each base, there is a higher probability that the first base is methylated, other bases may even do not have a C, thus low methylation%. Does it make sense to trim the first three bases in this case?

Trim_galore with --rrbs option trimmed another 2bp from the 3' end to remove the filled (end-repair introduced) Cs (unmethylated)

I read from here http://www.bioinformatics.babraham.ac.uk/projects/bismark/RRBS_Guide.pdf

Thank you, Ming

ADD REPLY
0
Entering edit mode

I realize this is a very delayed followup, but I was hoping you might clarify #1. Shouldn't the end-repair impact the 2 bases at the end of the read, not the first 3 bases?

ADD REPLY
0
Entering edit mode

One would think so, yes, but for some reason the third base seems to be affected at least sometimes too. No clue why.

ADD REPLY
1
Entering edit mode

But still, shouldn't it be the 2 or 3 bases at the end, not the beginning of the read?

For example, end-repair is causing problems at the end of the molecule and thus, the beginning of R2 for WGBS is wrong. Or is that a separate issue?

ADD REPLY

Login before adding your answer.

Traffic: 1905 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6