What does .info()
tell you?
The .info()
method delivers a summary containing:
- Number of rows and columns: A fundamental overview of your dataset’s size.
- Column names and data types: Identifies each column and the type of data it holds (e.g.,
int64
,float64
,object
,datetime64
). Understanding data types is critical for choosing appropriate analysis techniques. - Non-null counts: Displays the number of non-missing values in each column. This highlights potential data quality issues – missing data often requires handling before analysis.
- Memory usage: Shows the memory occupied by the DataFrame. This is particularly useful for large datasets where memory management is crucial.
Practical Examples
Let’s illustrate with examples. First, import Pandas:
import pandas as pd
Now, create a sample DataFrame:
= {'Name': ['Alice', 'Bob', 'Charlie', 'David', None],
data 'Age': [25, 30, 22, 28, 35],
'City': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney'],
'Salary': [50000, 60000, 45000, 70000, None]}
= pd.DataFrame(data) df
Now let’s use .info()
:
df.info()
This will output something similar to:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 4 non-null object
1 Age 5 non-null int64
2 City 5 non-null object
3 Salary 4 non-null float64
dtypes: float64(1), int64(1), object(2)
memory usage: 288.0+ bytes
Observe how .info()
clearly shows:
- We have 5 rows and 4 columns.
Name
andCity
are of typeobject
(likely strings).Age
is an integer (int64
).Salary
is a floating-point number (float64
).Name
andSalary
have missing values (Null).- The DataFrame’s memory usage.
Beyond the Basics: Working with Different Data Types
Let’s create a DataFrame with a datetime column:
= {'Date': pd.to_datetime(['2024-01-15', '2024-02-20', '2024-03-25']),
data2 'Value': [10, 20, 30]}
= pd.DataFrame(data2)
df2 df2.info()
The output will show the Date
column’s data type as datetime64[ns]
, demonstrating .info()
’s ability to handle various data types effectively. This information is vital for time series analysis or any operation involving dates.
Leveraging .info()
for Data Cleaning
The non-null counts revealed by .info()
are invaluable for identifying missing data. This is a critical first step in data cleaning. Before performing any analysis, you’ll want to address missing data appropriately (e.g., imputation or removal). The .info()
method helps you pinpoint which columns require attention.