Biostar Beta. Not for public use.
Sequence identity between sequences with different lengths
1
Entering edit mode
18 months ago

Hello,

A simple question. What is the sequence identity between 2 sequences when one is much larger than the other?

Example:

seq1:  -------------------AGTGTGAAAAAGGT----------------
seq2:  ATATATGCGCATGGTAATAAGTGTGAAAAAGGTTATATGCGCATAAGGT

The smaller sequence corresponds 100% to a subset of the bigger one. Do they have 100% identity? Or rather something like 30%, as seq1 corresponds to 30% of seq2?

The reason why I ask this is that I am filtering an alignment of two assemblies of the same genome (with nucmer/mumer) and I can filter out aligned contigs based on identity.

Thank you,
Ricardo

ADD COMMENTlink
1
Entering edit mode

Would have say that, if you look at seq1 it has 100% identity on 100% of its length, if you look at seq2 it has 100% identity on 30% of its length, it's a point a view

ADD REPLYlink
1
Entering edit mode

I would say seq1 is 100% identical to seq2, while seq2 is only 30% identical to seq1 .

unfortunately heavily depending on how you look at this

ADD REPLYlink
1
Entering edit mode
ADD REPLYlink
0
Entering edit mode

Great, that's it, thanks! It depends on what is the query and what is the reference. Thanks! (If you write it as an answer instead of a comment I'll accept it)

ADD REPLYlink
2
Entering edit mode

It also depends on whether you use global or local alignment.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1