Entering edit mode

Hi,

There is a sequence set **s={ACGT, CGGT, CGTC}** and the SCS of those 3 is **ACGGTC** and LCS is **CGT**. I got this example from a research paper. They did not mention the way they used to get SCS and LCS. So can anyone explain the algorithm to obtain SCS and LCS.

And also i'm confiusing that SCS(Shortest Common Super-sequence) obtained the long sequence though it calls shortest and LCS(Longest Common Super-sequence ) obtained shortest sequence though it calls longest.Why is that?

Thank you.

Entering edit mode

The shortest common super-sequence (SCS) is the shortest sequence that contains the others as subsequences. In your example, concatenating your three sequences gives ACGTCGGTCGTC which is a super-sequences of your set s. This is however not the shortest super-sequence which is ACGGTC.

LCS means longest common subsequence and it is the longest sequence that is a subsequence of the others. In your example, CG anf GT are two common subsequences but they are not the longest which is CGT.

The LCS problem is usually solved by dynamic programming. For more, see the wikipedia article.

For two sequences, the SCS is derived from the LCS: you simply add to the LCS all non-LCS nucleotides in the order they appear in the original sequences. For more sequences, the problem is NP-complete, i.e. one has to rely on approximations and/or heuristics to solve it e.g. see this paper.

Loading Similar Posts