Exporting Data from Pandas (CSV, Excel, etc.)

pandas
Published

November 23, 2023

Exporting to CSV (Comma Separated Values)

CSV is a simple, widely compatible format, making it an excellent choice for sharing data with others or importing into other applications. Pandas provides a straightforward to_csv() method for this purpose.

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 28],
        'City': ['New York', 'London', 'Paris']}
df = pd.DataFrame(data)

df.to_csv('exported_data.csv', index=False) # index=False prevents the DataFrame index from being written

The index=False argument is crucial; otherwise, Pandas will include the DataFrame’s index as a column in your exported CSV. You can also specify the delimiter (e.g., tab-separated values) using the sep argument:

df.to_csv('exported_data_tab.tsv', sep='\t', index=False)

Exporting to Excel (XLS and XLSX)

For more complex data structures or when you need formatting options, Excel is a popular choice. You’ll need the openpyxl library for .xlsx files (Excel 2007 and later) and xlwt for .xls (older Excel versions). Install them using pip: pip install openpyxl xlwt

import pandas as pd


df.to_excel('exported_data.xlsx', sheet_name='Sheet1', index=False)

#For older .xls files (requires xlwt)
#df.to_excel('exported_data.xls', sheet_name='Sheet1', index=False)

The sheet_name argument lets you specify the name of the sheet in your Excel file.

Exporting to JSON (JavaScript Object Notation)

JSON is a lightweight, text-based format ideal for web applications and APIs. Pandas provides to_json() for exporting data in various JSON formats.

import pandas as pd


df.to_json('exported_data.json', orient='records')

#Other Orient options include 'split', 'index', 'columns', 'values'  Experiment to find the best format for your needs.

The orient argument controls the structure of the JSON output. ‘records’ creates a list of dictionaries, one for each row. Explore the other options in the Pandas documentation for different JSON structures.

Handling Large Datasets

For very large datasets, consider using the chunksize argument in to_csv() or to_excel() to process the data in smaller chunks. This prevents memory errors and improves performance. This is particularly helpful for CSV exports.

df.to_csv('large_file.csv', index=False, chunksize=10000) #writes in 10000 row chunks.

Remember to adapt the chunksize to your system’s resources and the size of your dataset. Efficiently exporting large datasets can significantly reduce processing time. Further optimization techniques may also include compression.