Question

Reference based assembly with transgene

0

Entering edit mode

5.7 years ago

shashwat36 • 0

Hello, I am trying to sequence pichia genome with a transgene insertion. I have paired end reads from Illumina miniseq.

I tried using BWA with wild type genome as reference followed by samtools mpileup and vcfutils to get a consensus sequence. However, in that approach, my insertion vector was not a part of the final assembly. I aligned the reads against the my vector sequence and saw over 40,000 reads aligning to the vector sequence so I am sure that my insertion vector is present in the genome.

I then used Abyss to do a denovo assembly and used the scaffolds generated from abyss to align against the wild type genome and again I ended up with a final assembly with no vector sequence.

Is there a way to get a final assembly with four chromosomes (Pichia pastoris) and my vector sequences present?

Thanks, Shash

next-gen Assembly • 1.6k views

ADD COMMENT • link updated 5.7 years ago by colindaven 6.4k • written 5.7 years ago by shashwat36 • 0

0

Entering edit mode

I then used Abyss to do a denovo assembly and used the scaffolds generated from abyss to align against the wild type genome and again I ended up with a final assembly with no vector sequence.

Seems odd to me. Given the coverage you said your transgene had when you mapped using it as reference, it should have been assembled. Did you try blasting the transgene against the abyss assembly?

ADD REPLY • link 5.7 years ago by h.mon 35k

0

Entering edit mode

Yes I filtered the reads that aligned to my vector sequence and then I aligned those reads against the contigs generated by Abyss and they all align. My issue is that when I take those contig sequences and align them against wild type genome as reference using BWA, my vector disappears in the final assembly

ADD REPLY • link 5.7 years ago by shashwat36 • 0

0

Entering edit mode

Of course the vector "disappears", the reference genome doesn't contain it and my guess is it is soft clipped.

You are not explaining in depth what you are doing, nor are you providing the commands used - details matter a lot here.

ADD REPLY • link 5.7 years ago by h.mon 35k

0

Entering edit mode

Okay here is what I am trying to do. I am trying to get a fully annotated genome of my strain and since Pichia is already fully sequenced an annotated, I was hoping to leverage that data instead of doing it from scratch. Here is what I have done so far:

#assemble reads with abyss

abyss-pe name=pp1 k=64 in='reads1.fq reads2.fq'

This yields ~200 contigs. Now instead of finding ORFs and annotating them, I figured that I can use BWA to align these contigs to Pichia pastoris genome and them manually annotate my vector sequences. And this will also allow me to find locations and copy numbers of my gene.

#use bwa to align to GS115 strain (fully sequenced from NCBI)

bwa index reference.fa

bwa mem reference.fa contigs.fa | samtools sort -o output.bam

samtools index output.bam

Then I aligned my reads to my insertion vector using bwa and got the mapped reads using samtools. Then I aligned those mapped reads against the assembled genome using bwa but only very few reads aligned. I am guessing these are the soft clipped reads and therefore, it seems like my vector is not a part of the final assembly.

And my question is that is there a better way to get the final genome sequence contacting the vector?

Thanks

ADD REPLY • link 5.7 years ago by shashwat36 • 0

0

Entering edit mode

You could map your reads against the available genome (which doesn't contain the transgene), then just assemble the non-mapping reads de novo.

ADD REPLY • link 5.7 years ago by cschu181 ★ 2.8k

score 0 · Answer 1 · 2018-07-29

0

Entering edit mode

5.7 years ago

h.mon 35k

Create a blast database with your abyss-assembled genome, and search the transgene against this database.

ADD COMMENT • link 5.7 years ago by h.mon 35k

0

Entering edit mode

I don't see how that is different from aligning the transgene against the abyss-assembled genome using BWA. It wouldn't give the integration locus on the genome and neither would it give me the final annotated consensus sequence. Any thoughts?

ADD REPLY • link 5.7 years ago by shashwat36 • 0

1

Entering edit mode

If your transgene has been assembled, it will be part of a contig. Blasting the transgene against the abyss-assembled genome would return the contig and the position in the contig. Then you could examine this position using IGV.

ADD REPLY • link 5.7 years ago by h.mon 35k

1

Entering edit mode

BLAST will be a lot more sensitive - allow more mismatches, and partial alignments - than BWA.

ADD REPLY • link 5.7 years ago by colindaven 6.4k

score 0 · Answer 2 · 2018-07-31

Really difficult problem. It might make sense to search the raw reads for your insertion sequence - use python or grep.

In my experience neither de novo nor alignment strategies work well for this problem. A long read assembly would probably nail it, but noone seems to have the money for doing those with insertion experiments....

Some companies offer more advanced wetwork approaches, which seem promising.