Given two strings of letters how can you quantify the similarity between them? Answering this question has far reaching consequences across a multitude of fields - from correcting the typo in your text message to figuring out what parts of the DNA sequence encode. One way, well known to both theoreticians and practitioners, is through the so-called longest common subsequence. It is a string of characters that are included in both words you are trying to compare. The length of the this string can be thought of a similarity score. The first task would be to find this longest common subsequence - a problem that can be solved via dynamic programming. Second, one can study theoretical properties - if you fix an alphabet and generate the two words at random using letters from the alphabet, what is the typical similarity score you get? How does it change with the length of the generated strings?
Supervisor: George Kerchev
Difficulty level: Introductory
Tools: Any programming language