Finding the longest common substring between two strings is a classic computer science problem with applications in various fields, from bioinformatics (comparing DNA sequences) to plagiarism detection. This blog post will look at efficient ways to solve this problem in Python, providing clear explanations and code examples.
Understanding the Problem
Before diving into the code, let’s clearly define the problem. Given two strings, we need to find the longest substring that is present in both strings. For example:
- String 1: “abcdefg”
- String 2: “bcdefgh”
The longest common substring is “bcdefg”. Note that it’s a substring, meaning it’s a contiguous sequence of characters. “bcd” is a common substring, but it’s shorter than “bcdefg”.
Approach 1: Brute Force
A straightforward, though inefficient, approach is brute force. We can iterate through all possible substrings of the shorter string and check if each substring is present in the longer string.
def longest_common_substring_brute_force(str1, str2):
"""
Finds the longest common substring using brute force.
Args:
str1: The first string.
str2: The second string.
Returns:
The longest common substring. Returns an empty string if no common substring is found.
"""
= len(str1)
m = len(str2)
n = ""
longest_substring
for i in range(m):
for j in range(i, m):
= str1[i:j+1]
substring if substring in str2 and len(substring) > len(longest_substring):
= substring
longest_substring return longest_substring
= "abcdefg"
string1 = "bcdefgh"
string2 = longest_common_substring_brute_force(string1, string2)
result print(f"Longest common substring (brute force): {result}") # Output: bcdefg
= "applepie"
string3 = "pineapple"
string4 = longest_common_substring_brute_force(string3,string4)
result print(f"Longest common substring (brute force): {result}") # Output: apple
This brute-force method has a time complexity of O(mnk), where ‘m’ and ‘n’ are the lengths of the strings and ‘k’ is the maximum length of a common substring. It’s computationally expensive for longer strings.
Approach 2: Dynamic Programming
A more efficient approach uses dynamic programming. We create a table (matrix) where each cell (i, j) represents the length of the longest common substring ending at str1[i] and str2[j].
def longest_common_substring_dp(str1, str2):
"""
Finds the longest common substring using dynamic programming.
Args:
str1: The first string.
str2: The second string.
Returns:
The longest common substring. Returns an empty string if no common substring is found.
"""
= len(str1)
m = len(str2)
n = [[0 for _ in range(n + 1)] for _ in range(m + 1)]
dp = ""
longest_substring = 0
max_length
for i in range(1, m + 1):
for j in range(1, n + 1):
if str1[i - 1] == str2[j - 1]:
= dp[i - 1][j - 1] + 1
dp[i][j] if dp[i][j] > max_length:
= dp[i][j]
max_length = str1[i - max_length:i]
longest_substring
return longest_substring
= "abcdefg"
string1 = "bcdefgh"
string2 = longest_common_substring_dp(string1, string2)
result print(f"Longest common substring (dynamic programming): {result}") # Output: bcdefg
= "applepie"
string3 = "pineapple"
string4 = longest_common_substring_dp(string3,string4)
result print(f"Longest common substring (dynamic programming): {result}") # Output: apple
Dynamic programming offers a significant performance improvement with a time complexity of O(m*n), making it much more suitable for longer strings.
Choosing the Right Approach
For short strings, the brute-force approach might suffice. However, for longer strings, the dynamic programming approach is strongly recommended due to its better time complexity. The dynamic programming solution provides a more scalable and efficient way to solve the longest common substring problem.