Expect heterozygous variant from reads
0
0
Entering edit mode
9 months ago
pablo ▴ 300

Hello,

I have a diploïd yeast genome. There is a gene cassette, inserted in a heterozygous way, confirmed by PCR in this sample.

The sample was sequencend by CCS Pacbio technology. I did a diploïd assembly with hifiasm of this sample, which gave me a contiguous assembly (almost 1 contig per chromosome).

What I want to verify is if this cassette is found on only one of the two haplotypes. I have the cassette sequence (about 3kb), that I have aligned on this assembly. I found it on the two haplotypes, whereas I expect it on only one (heterozygous) : hifiasm collapsed the herezygous variation then.

Is there a way to detect this heterozygous variant from the reads? If I use the S288C reference genome (which doesn't have the cassette) :

  • BLAST or align the sequence cassette to the Pacbio reads + align the Pacbio reads to the S288C genome.
  • Then, check with IGV at the S288C position where are my reads which have the cassette. And determine a proportion ~50/50% : reads with the insertion compared to S288C genome, and the others without.

Any help? Best

bam align igv • 537 views
ADD COMMENT
0
Entering edit mode

You were able to assemble completely the two haplotypes seperately because the genome is highly heterozygous? You would therefore definitely expect that the cassette insertion would also be phased... Can you in some way verify your phasing? Can you try another phased assembly method?

Otherwise if you are not concerned about the phasing really, just align the reads to your assembly then look at the cassette region for reads that align to either side of the cassette. You should also see a relative drop in coverage. You could also extract all the reads that align in the region and perform a local assembly...but it is still strange that the cassette was not phased in the first place. But there is no real need to use the reference

ADD REPLY
0
Entering edit mode

Sorry for my late answer. I was meaning my phased assembly looked contiguous because there are few contigs and size closed to the expected genome. I wanted to verify the phasing with the cassette, normally found on one of the two haplotypes : but it was found on both.

I also tried a phased assembly with flye + hapdup : same results and less contiguous.

Then, what I did :

  • blast the cassette to the reads : extract the reads ids (177 reads) which match the cassette
  • align the cassette to the haploïd genome to get the position (724,861 on chromosome_XIII) + align the reads to this whole genome
  • extract the read ids which aligned to this position (+/- 5,000 pb) (142 reads)
  • and compare : 97 common reads between the blast result and the whole alignement at the position + 45 reads not in the blast result

I look at these 45 reads on the alignement, and they are all in the range pos-707,479 - pos-729,984" . I blast the cassette to them and there was not any hit. Can we consider these 45 reads can confirm the heterozygous insertion? 97 reads with the cassette, 45 reads with not.

I can share an IGV screenshot at the position. We can spot some heterozygous variant (some SNPs, or the deletion on the right), but not a drop coverage..

alignement

I also did a local assembly with all the reads around the 724,000pn, but hifiasm gave me exactly the same haplotypes.

ADD REPLY

Login before adding your answer.

Traffic: 1789 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6