Find the Longest Common Prefix in a Trie

problem-solving
Published

December 26, 2024

Efficiently finding the longest common prefix (LCP) among a set of strings is a common task in various applications, from autocomplete suggestions to data compression. While there are many approaches, using a Trie data structure offers an elegant and efficient solution. This post explores how to implement this efficiently in Python.

Understanding Tries

A Trie (also known as a prefix tree) is a tree-like data structure used for storing a dynamic set or associative array where the keys are usually strings. Each node in a Trie represents a prefix of a string, and paths from the root to a leaf node represent complete strings. This structure makes searching and finding common prefixes incredibly efficient.

Python Implementation

Let’s build a Trie and implement the LCP function. We’ll represent each node as a dictionary:

class TrieNode:
    def __init__(self):
        self.children = {}  # Dictionary to store child nodes
        self.is_end_of_word = False # Flag to indicate end of a word


class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end_of_word = True

    def longest_common_prefix(self):
        lcp = ""
        node = self.root
        while len(node.children) == 1 and not node.is_end_of_word:
            for char in node.children:
                lcp += char
                node = node.children[char]
        return lcp

Usage Example

Here’s how to use the Trie class to find the LCP:

trie = Trie()
words = ["flower", "flow", "flight"]

for word in words:
    trie.insert(word)

lcp = trie.longest_common_prefix()
print(f"The longest common prefix is: {lcp}")  # Output: fl

words2 = ["dog", "cat", "bird"]
trie2 = Trie()
for word in words2:
    trie2.insert(word)
lcp2 = trie2.longest_common_prefix()
print(f"The longest common prefix is: {lcp2}") # Output: ""

words3 = ["apple", "app", "appetizer"]
trie3 = Trie()
for word in words3:
    trie3.insert(word)
lcp3 = trie3.longest_common_prefix()
print(f"The longest common prefix is: {lcp3}") # Output: ap

This code first inserts the words into the Trie. Then, the longest_common_prefix method traverses the Trie from the root, following the single path representing the common prefix until it encounters a node with multiple children or a word ending marker. The function efficiently finds the LCP without explicitly comparing strings pairwise.

Handling Empty Input

The current longest_common_prefix function assumes at least one word is inserted. You could add error handling for the case of an empty Trie to make it more robust. For instance, you might return an empty string or raise an exception if the Trie is empty.

Further Optimizations

For extremely large datasets, further optimizations might be considered, such as using more memory-efficient data structures for the Trie nodes or employing parallel processing techniques. However, for many common use cases, the implementation presented here provides a good balance between simplicity and efficiency.