DataFrame Info

pandas
Published

January 5, 2025

What does .info() tell you?

The .info() method delivers a summary containing:

  • Number of rows and columns: A fundamental overview of your dataset’s size.
  • Column names and data types: Identifies each column and the type of data it holds (e.g., int64, float64, object, datetime64). Understanding data types is critical for choosing appropriate analysis techniques.
  • Non-null counts: Displays the number of non-missing values in each column. This highlights potential data quality issues – missing data often requires handling before analysis.
  • Memory usage: Shows the memory occupied by the DataFrame. This is particularly useful for large datasets where memory management is crucial.

Practical Examples

Let’s illustrate with examples. First, import Pandas:

import pandas as pd

Now, create a sample DataFrame:

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', None],
        'Age': [25, 30, 22, 28, 35],
        'City': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney'],
        'Salary': [50000, 60000, 45000, 70000, None]}
df = pd.DataFrame(data)

Now let’s use .info():

df.info()

This will output something similar to:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    4 non-null      object
 1   Age     5 non-null      int64 
 2   City    5 non-null      object
 3   Salary  4 non-null      float64
dtypes: float64(1), int64(1), object(2)
memory usage: 288.0+ bytes

Observe how .info() clearly shows:

  • We have 5 rows and 4 columns.
  • Name and City are of type object (likely strings).
  • Age is an integer (int64).
  • Salary is a floating-point number (float64).
  • Name and Salary have missing values (Null).
  • The DataFrame’s memory usage.

Beyond the Basics: Working with Different Data Types

Let’s create a DataFrame with a datetime column:

data2 = {'Date': pd.to_datetime(['2024-01-15', '2024-02-20', '2024-03-25']),
         'Value': [10, 20, 30]}
df2 = pd.DataFrame(data2)
df2.info()

The output will show the Date column’s data type as datetime64[ns], demonstrating .info()’s ability to handle various data types effectively. This information is vital for time series analysis or any operation involving dates.

Leveraging .info() for Data Cleaning

The non-null counts revealed by .info() are invaluable for identifying missing data. This is a critical first step in data cleaning. Before performing any analysis, you’ll want to address missing data appropriately (e.g., imputation or removal). The .info() method helps you pinpoint which columns require attention.