Mastering Reindexing in Pandas: Enhancing Dataframe flexibility

Python Pandas @ Freshers.in

In the versatile world of data manipulation with Pandas, reindexing is a fundamental technique to rearrange the data according to a new set of indices. This article explores the concept of reindexing in Pandas, its parameters, and demonstrates its application with real-world examples.

Reindexing in Pandas

Reindexing in Pandas involves altering the row labels and column labels of a DataFrame. This process is essential for reordering existing data, aligning data from different sources, and handling missing values.

Key Parameters of Reindexing

The reindexing function comes with several parameters, each serving a specific purpose:

  1. labels: New labels to index along the axis.
  2. index / columns: Alternatives to labels for row or column reindexing.
  3. method: Method to fill in missing values (‘ffill’, ‘bfill’, etc.).
  4. fill_value: Value to use for missing values introduced by the reindexing.
  5. limit: Maximum number of consecutive missing values to fill.
  6. tolerance: Maximum distance between original and new label for filling.
  7. level: Match simple Index levels with MultiIndex levels.
  8. copy: Copy underlying data if new labels are equivalent to old labels.

Implementing Reindexing in Pandas

Creating a Sample DataFrame

Let’s begin with a DataFrame representing different individuals and their attributes.

import pandas as pd
# Your original data and index : Learning @ Freshers.in 
data = {'Age': [32, 29, 35, 40, 28, 33],
        'City': ['Mumbai', 'Bangalore', 'Chennai', 'Delhi', 'New York', 'San Francisco']}
index = ['Sachin', 'Manju', 'Ram', 'Raju', 'David', 'Wilson']
df = pd.DataFrame(data, index=index)
print(df)
Output
        Age           City
Sachin   32         Mumbai
Manju    29      Bangalore
Ram      35        Chennai
Raju     40          Delhi
David    28       New York
Wilson   33  San Francisco

Applying Reindexing

Now, we’ll reindex the DataFrame to introduce new individuals, reorder existing ones, and see how missing values are handled.

# New index
new_index = ['Manju', 'Ram', 'Sachin', 'Raju', 'David', 'Wilson', 'Michael']
# Reindexing
df_reindexed = df.reindex(new_index)
# Handling missing values
df_reindexed['Age'].fillna(0, inplace=True)
df_reindexed['City'].fillna('Unknown', inplace=True)
# Display the updated DataFrame
print(df_reindexed)
Output
          Age           City
Manju    29.0      Bangalore
Ram      35.0        Chennai
Sachin   32.0         Mumbai
Raju     40.0          Delhi
David    28.0       New York
Wilson   33.0  San Francisco
Michael   0.0        Unknown

In this example, ‘Michael’ is a new individual not present in the original DataFrame, and we’ve provided default values for missing data.

Benefits of Reindexing

  • Flexibility: Easily modify the DataFrame structure without losing data integrity.
  • Data Alignment: Align different DataFrames to have a common set of indices.
  • Missing Data Handling: Control over the treatment of missing values during reindexing.
Author: user