NumPy Array Dtype

numpy
Published

December 21, 2023

What are NumPy Dtypes?

Every element within a NumPy array must be of the same data type. This is enforced by the dtype attribute, which specifies the type of data stored in the array. Unlike Python lists, which can hold mixed data types, NumPy arrays maintain type homogeneity for efficiency. This homogeneity enables NumPy to perform vectorized operations—applying operations to the entire array at once, significantly speeding up computations.

Common NumPy Dtypes

NumPy supports a wide variety of dtypes, catering to diverse data needs. Here are some of the most frequently used:

  • int8, int16, int32, int64: Signed integers of 8, 16, 32, and 64 bits respectively. The larger the number of bits, the larger the range of integers that can be stored.

  • uint8, uint16, uint32, uint64: Unsigned integers (non-negative) with similar bit sizes.

  • float16, float32, float64: Floating-point numbers (numbers with decimal points) of varying precision. float64 (often referred to as double) is the default.

  • complex64, complex128: Complex numbers (numbers with real and imaginary parts).

  • bool: Boolean values (True or False).

  • str: Strings (text). Note that all strings within a single array must have the same length.

  • object: Can hold arbitrary Python objects. This is generally less efficient than other dtypes.

Specifying Dtypes During Array Creation

You can explicitly specify the dtype when creating a NumPy array:

import numpy as np

arr_int = np.array([1, 2, 3, 4], dtype=np.int32)
arr_float = np.array([1.1, 2.2, 3.3], dtype=np.float64)
arr_bool = np.array([True, False, True], dtype=bool)
print(arr_int.dtype)  # Output: int32
print(arr_float.dtype) # Output: float64
print(arr_bool.dtype) # Output: bool

#Inferring dtype from input.
arr_auto = np.array([1, 2, 3, 4])
print(arr_auto.dtype) #Output: int64 (or int32 depending on system)

Changing Dtypes

You can change the dtype of an existing array using the astype() method:

arr = np.array([1, 2, 3], dtype=np.int32)
arr_float = arr.astype(np.float64)
print(arr_float.dtype)  # Output: float64
print(arr_float) # Output: [1. 2. 3.]

#Casting to smaller dtypes can lead to truncation.
arr_int8 = arr.astype(np.int8)
print(arr_int8) # Output: [1 2 3] (assuming no overflow)

#Casting to strings
arr_str = arr.astype(str)
print(arr_str)

Understanding Type Promotion

When performing operations between arrays of different dtypes, NumPy promotes the data types to ensure compatibility. Generally, the type with higher precision will be chosen. For example, an operation between an int32 and float64 array will result in a float64 array.

Structured Arrays

NumPy also allows creating arrays with structured dtypes, enabling storage of different data types within a single array:

data_type = np.dtype([('name', 'U10'), ('age', 'i4'), ('height', 'f4')])
people = np.array([('Alice', 30, 1.75), ('Bob', 25, 1.80)], dtype=data_type)
print(people['name'])
print(people['age'])

This provides a way to represent more complex data structures within the efficient NumPy framework. Understanding these more advanced features is crucial for working with more complex datasets in scientific computing and data analysis.