Biostar Beta. Not for public use.
Sequence identity between sequences with different lengths
1
Entering edit mode
18 months ago

Hello,

A simple question. What is the sequence identity between 2 sequences when one is much larger than the other?

Example:

seq1:  -------------------AGTGTGAAAAAGGT----------------
seq2:  ATATATGCGCATGGTAATAAGTGTGAAAAAGGTTATATGCGCATAAGGT


The smaller sequence corresponds 100% to a subset of the bigger one. Do they have 100% identity? Or rather something like 30%, as seq1 corresponds to 30% of seq2?

The reason why I ask this is that I am filtering an alignment of two assemblies of the same genome (with nucmer/mumer) and I can filter out aligned contigs based on identity.

Thank you,
Ricardo

1
Entering edit mode

Would have say that, if you look at seq1 it has 100% identity on 100% of its length, if you look at seq2 it has 100% identity on 30% of its length, it's a point a view

1
Entering edit mode

I would say seq1 is 100% identical to seq2, while seq2 is only 30% identical to seq1 .

unfortunately heavily depending on how you look at this

1
Entering edit mode
0
Entering edit mode

Great, that's it, thanks! It depends on what is the query and what is the reference. Thanks! (If you write it as an answer instead of a comment I'll accept it)