Efficiently finding the longest common prefix (LCP) among a set of strings is a common task in various applications, from autocomplete suggestions to data compression. While there are many approaches, using a Trie data structure offers an elegant and efficient solution. This post explores how to implement this efficiently in Python.
Understanding Tries
A Trie (also known as a prefix tree) is a tree-like data structure used for storing a dynamic set or associative array where the keys are usually strings. Each node in a Trie represents a prefix of a string, and paths from the root to a leaf node represent complete strings. This structure makes searching and finding common prefixes incredibly efficient.
Python Implementation
Let’s build a Trie and implement the LCP function. We’ll represent each node as a dictionary:
class TrieNode:
def __init__(self):
self.children = {} # Dictionary to store child nodes
self.is_end_of_word = False # Flag to indicate end of a word
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word):
= self.root
node for char in word:
if char not in node.children:
= TrieNode()
node.children[char] = node.children[char]
node = True
node.is_end_of_word
def longest_common_prefix(self):
= ""
lcp = self.root
node while len(node.children) == 1 and not node.is_end_of_word:
for char in node.children:
+= char
lcp = node.children[char]
node return lcp
Usage Example
Here’s how to use the Trie
class to find the LCP:
= Trie()
trie = ["flower", "flow", "flight"]
words
for word in words:
trie.insert(word)
= trie.longest_common_prefix()
lcp print(f"The longest common prefix is: {lcp}") # Output: fl
= ["dog", "cat", "bird"]
words2 = Trie()
trie2 for word in words2:
trie2.insert(word)= trie2.longest_common_prefix()
lcp2 print(f"The longest common prefix is: {lcp2}") # Output: ""
= ["apple", "app", "appetizer"]
words3 = Trie()
trie3 for word in words3:
trie3.insert(word)= trie3.longest_common_prefix()
lcp3 print(f"The longest common prefix is: {lcp3}") # Output: ap
This code first inserts the words into the Trie. Then, the longest_common_prefix
method traverses the Trie from the root, following the single path representing the common prefix until it encounters a node with multiple children or a word ending marker. The function efficiently finds the LCP without explicitly comparing strings pairwise.
Handling Empty Input
The current longest_common_prefix
function assumes at least one word is inserted. You could add error handling for the case of an empty Trie to make it more robust. For instance, you might return an empty string or raise an exception if the Trie is empty.
Further Optimizations
For extremely large datasets, further optimizations might be considered, such as using more memory-efficient data structures for the Trie nodes or employing parallel processing techniques. However, for many common use cases, the implementation presented here provides a good balance between simplicity and efficiency.