Biostar Beta. Not for public use.
Identify insertions with IGV
0
Entering edit mode
15 months ago
Germany, Mannheim, UMM

Dear all,

is it possible to visualize insertions in a sequence?

I have prepared a simulated sequence of the mitochondrial genome from the release hg38 by placing non human sequences right in the middle of it (position 8284). I then aligned the simulated genome to the mitochondrial index and the visualized the alignment with the integrated genome viewer IGV. However, I don't see any sign of insertions in the figure.

enter image description here

Is there a way to highlight the insertion point? Maybe by showing only clipped reads or the reads that map only on one mate?

Thank you.

ADD COMMENTlink
2
Entering edit mode

IGV is meant to visualize your alignment. It is not a variant caller. Appropriate tools for SV identification exist, e.g. lumpy

ADD REPLYlink
1
Entering edit mode

In IGV, insertions are represented with I. I can see a bunch of purple I in your snapshot. Please refer to: http://software.broadinstitute.org/software/igv/AlignmentData for more info.

Insertions
In a gapped alignment, IGV indicates insertions with respect to the reference with a purple I () or red I for  insertions greater than a user activated and specified cutoff.  Hover over the insertion symbol to view the inserted bases.
ADD REPLYlink
0
Entering edit mode

sorry, I forgot to mention that -- in order to differentiate the simulated sequence from the original -- I also generated random mutations in the sequence. The Is might simply who that. One might also argue that the coloured reads might mark the insertion point, but there are other regions with such colouring (not reported in the figure), so it is not a specific marker.

ADD REPLYlink
0
Entering edit mode

sorry, I forgot to mention that -- in order to differentiate the simulated sequence from the original -- I also generated random mutations in the sequence. The Is might simply who that

So is this real data salted with simulated reads or just plain simulated reads?

ADD REPLYlink
0
Entering edit mode

The procedure was this: I split the mitochondrial fasta file from grch38 into two pieces and merged the non human sequence in between. then I used EMBOSS to introduce random mutations and then ART to generate fastq pair mates. I then used BWA MEM to align to the mitochondrial index (prepared with BWA index for the original grch38 mitochondrial fasta).

ADD REPLYlink
0
Entering edit mode

merged the non human sequence in between.

What was the length of this sequence? When you are referring to insertions are you referring to single bp or something longer like the actual size of the non-human sequence you inserted.

ADD REPLYlink
0
Entering edit mode

I placed a stretch of 4000 bases from Parvovirus B19 after base 8284 of the mitochondrion, then introduced 500 mutattions with msbar.

ADD REPLYlink
1
Entering edit mode

Take a look at the "Detecting structural variants" section on this IGV help page.

ADD REPLYlink
0
Entering edit mode

The figure I get after colouring for the INSERT SIZE (and INSERT SIZE AND PAIR ORIENTATION) is this: enter image description here

With a bit of imagination, one could argue that there is a purple blob in the centre of the genome, where should be the insertion point. This is the enlargement: enter image description here would this be enough to say that IGV suggests a large insertion event?

ADD REPLYlink
0
Entering edit mode

Depends on your context , if it's "somatic" insertion could be .... By the way your alignment is full of insertion ( first picture ) is it still simulated reads ?

ADD REPLYlink
0
Entering edit mode

yes. since there are 3 types of mutations in msbar (insertion, deletion, substitutions), there should be in theory 500/3 insertions points.

ADD REPLYlink
0
Entering edit mode

You artificial insertion is too big to be picked by IGV, and also to big to affect insert size, as it is probably larger than the simulated insert size. In this scenario, what you would have is an increase of one mate mapped, the other unmapped, close to the insertion point. You could argue there is an insertion larger than your sequencing insert size, but without further data, you can't say how much larger.

ADD REPLYlink
1
Entering edit mode

Do you mean highlight the insertion point in the coverage bar ? So why don't you use another way to check insertion point ( based on coverage insertion rate with IGVtools or variant caller) and then check it on IGV ?

ADD REPLYlink
0
Entering edit mode

I thought IGV might show reads that have peculiar behaviour such as those with soft clips or a single mate mapped. If there are other tools, I will be happy to use them...

ADD REPLYlink
0
Entering edit mode

You have to enable "show soft clipped bases" in IGV preferences.

ADD REPLYlink
0
Entering edit mode

yes I did. The figure reports clipped reads included

ADD REPLYlink
3
Entering edit mode
17 months ago
h.mon 25k
Brazil

I see evidence of the "transgene" insertion: all those identical soft-clipped bases centered at the position you inserted the non-human sequence. Pay attention: 1) all reads are soft-clipped at the same reference position, 2) as far as I can tell, all soft-clipped bases are identical between different reads.

Look at the picture below. The big red arrow indicates the insertion point, and the darkened rectangles indicate the inserted sequence (which I was able to determine as parvovirus by blasting them, even before you told us it was parvovirus).

image3721

However, keep in mind this visual inspection works well because you have a simple, small and with no duplications reference genome, and a simple and small insertion, without other copies of it throughout the reference genome. As WouterDeCoster pointed above, there are better methods to identify structural variation events in more complex scenarios.

ADD COMMENTlink
0
Entering edit mode

OK then, IGV is not the tool for checking insertion sites. I will use other tools. If there are other suggestions over lumpy I will be happy to check them. Thank you.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1