The Hamming distance is a fundamental concept in computer science and information theory. It quantifies the similarity between two strings of equal length. Specifically, it measures the number of positions at which the corresponding symbols are different. This is particularly useful when working with binary strings (strings containing only 0s and 1s), where the Hamming distance represents the number of bits that differ between two binary strings.
This blog post will look at how to efficiently calculate the Hamming distance between two binary strings in Python. We’ll cover many approaches, ranging from a basic iterative method to more concise techniques leveraging Python’s built-in functionalities.
Understanding Hamming Distance
Before diving into the code, let’s solidify our understanding with an example. Consider two binary strings:
string1 = "101101"
string2 = "110100"
To find the Hamming distance, we compare each bit at the same position:
- Position 0: 1 and 1 are the same (no difference).
- Position 1: 0 and 1 are different (one difference).
- Position 2: 1 and 0 are different (one difference).
- Position 3: 1 and 1 are the same (no difference).
- Position 4: 0 and 0 are the same (no difference).
- Position 5: 1 and 0 are different (one difference).
Therefore, the Hamming distance between “101101” and “110100” is 3.
Python Code Examples
Here are many ways to compute the Hamming distance in Python:
Method 1: Iterative Approach
This is a straightforward method that iterates through the strings and counts the differences.
def hamming_distance_iterative(str1, str2):
"""Calculates the Hamming distance using iteration."""
if len(str1) != len(str2):
raise ValueError("Strings must be of equal length")
= 0
distance for i in range(len(str1)):
if str1[i] != str2[i]:
+= 1
distance return distance
= "101101"
string1 = "110100"
string2 = hamming_distance_iterative(string1, string2)
distance print(f"Hamming distance (iterative): {distance}") # Output: 3
Method 2: Using zip
and sum
Python’s zip
function allows for a more concise solution. We can use it to iterate through both strings simultaneously, and sum
to count the differences.
def hamming_distance_zip(str1, str2):
"""Calculates the Hamming distance using zip and sum."""
if len(str1) != len(str2):
raise ValueError("Strings must be of equal length")
return sum(c1 != c2 for c1, c2 in zip(str1, str2))
= "101101"
string1 = "110100"
string2 = hamming_distance_zip(string1, string2)
distance print(f"Hamming distance (zip): {distance}") # Output: 3
Method 3: Using XOR and bin()
(For Integer Inputs)
If you are working with integers represented as binary strings, you can use the bitwise XOR operator (^
) and the bin()
function for a potentially faster approach. This method converts integers to binary strings, performs XOR, and then counts the number of ’1’s in the result.
def hamming_distance_xor(int1, int2):
"""Calculates the Hamming distance using XOR (for integers)."""
= int1 ^ int2
xor_result = bin(xor_result)[2:] # [2:] removes "0b" prefix
binary_representation return binary_representation.count('1')
= int("101101", 2) # Convert binary string to integer
int1 = int("110100", 2)
int2 = hamming_distance_xor(int1, int2)
distance print(f"Hamming distance (XOR): {distance}") #Output: 3
These examples demonstrate various approaches to calculating the Hamming distance, each with its own advantages depending on the context and your preference for code style. Remember to handle potential errors, such as strings of unequal length. Choosing the most efficient method will depend on the specific application and the size of the input strings.