Illumina reads clipped at the expected Cas9 double-strand break site – how to interpret?
0
0
Entering edit mode
22 months ago
francois ▴ 80

I have mutated a gene using CRISPR-Cas9, amplified a ~ 150-bp PCR product and sequenced using Illumina MiSeq. I am now looking at the reads aligned to the amplicon reference sequence in IGV.

I do this routinely but I think I have never seen an example like this. The sample I am looking at has reads clipped at the expected double-strand break site. See IGV screenshot below. In the reference sequence (below), note the AGG in uppercase, this is the NGG PAM. We are expecting Cas9 to create a double-strand break a few bp upstream. The 11-bp deletion in the first two reads matches this. However, the following reads stop matching exactly at this position. The soft-clipped (non-matching) portion is my Forward primer binding site (ACAGCGTTCTGTTTGTAGCGA) followed by the first 9 bp of my amplicon (i.e. we would expect these 9 bp a good 70 bp upstream!). There is no strand bias, both Forward and Reverse reads look like this.

I do not know how to interpret this. Any ideas?

enter image description here

illumina cas9 sequencing crispr miseq • 1.0k views
ADD COMMENT
0
Entering edit mode

Your experiment mostly failed with just a couple of molecules squeaking though?

ADD REPLY
0
Entering edit mode

Actually, isn't that exactly the reads you would expect if there was a genuine 70 bp deletion? I just don't know why bwa mem creates a soft-clip instead of opening a 70-bp gap. What do you think?

My situation seems similar to this question: Bwa Alignment: Gap Vs Soft-Clipped Sequences, I think?

ADD REPLY
1
Entering edit mode

We are expecting Cas9 to create a double-strand break a few bp upstream. The 11-bp deletion in the first two reads matches this.

My comment was based on this. So you think that there is actually a 70 bp gap and these two reads on top are an anomaly? IGV will downsample reads by default so you may not be seeing all alignments.

Can you tell us which aligner you used to do these alignments? If your aligner only did ungapped alignments then you would not see reads that will span that 70 bp gap?

ADD REPLY
0
Entering edit mode

Ah I see. These are zebrafish embryos injected with Cas9 at the single-cell stage. What you need to know is: I expect them to have a bunch of different variants, like somatic mutations if you wish. I typically get anywhere from 1 to ~ 40 different mutations in each embryo. This is caused by the embryo developing (genomes replicating/cells dividing) while Cas9 is still around to create double-strand breaks.

So I trust the 11-bp deletions, this is very usual for me. But that does not mean there isn't any other mutations. Actually, it's relatively infrequent that I get one unique mutation.

Aligner is bwa mem. Command is

bwa mem -M reference.fa forwardReads.fastq.gz reverseReads.fastq.gz`

What do you mean by ungapped alignments? It's clearly capable of opening gaps otherwise I would never see deletions, like the 11-bp deletions (which I trust).

I've noticed this parameter in the bwa mem help-page (http://bio-bwa.sourceforge.net/bwa.shtml)

-w INT Band width.

Essentially, gaps longer than INT will not be found. Note that the maximum gap length is also affected by the scoring matrix and the hit length, not solely determined by this option. [100]

That vaguely sounds like what I'm after, but this probable deletion is 60–70 bp, so not above the limit (I have never changed this setting, so should be at default i.e. 100). I could try changing the scoring but sounds a bit risky... For example decreasing gap open or gap extension penalty or increasing clipping penalty.

Is this clearer?

Edit: I have tried setting -w higher and setting -O (penalty for opening a gap) lower or -E (penalty for extending a gap) lower, nothing seems to change these reads. Only creates a big mess in the good reads.

ADD REPLY
0
Entering edit mode

Having said this, I now noticed a bunch of similar reads, except that the soft-clipped portion is larger, messier and can't be easily interpreted as a big deletion. I still thinks that the fact that these reads are clipped at the predicted double-strand breaks probably means something but probably hard to interpret from short reads alone. I'll probably let it go for now, except if anyone has any new suggestions.

ADD REPLY
1
Entering edit mode

I don't know how long your reads are. If there is indeed a 70+ bp deletion then bwa may not be willing to put in that big a gap.

Can you try using a splice aware aligner like bbmap or STAR to see if you can find reads that align across that gap? If you reads are ending at that break (going into the adapter) then result will not change.

ADD REPLY

Login before adding your answer.

Traffic: 1872 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6