In the versatile world of data manipulation with Pandas, reindexing is a fundamental technique to rearrange the data according to a new set of indices. This article explores the concept of reindexing in Pandas, its parameters, and demonstrates its application with real-world examples.
Reindexing in Pandas
Reindexing in Pandas involves altering the row labels and column labels of a DataFrame. This process is essential for reordering existing data, aligning data from different sources, and handling missing values.
Key Parameters of Reindexing
The reindexing function comes with several parameters, each serving a specific purpose:
labels
: New labels to index along the axis.index
/columns
: Alternatives tolabels
for row or column reindexing.method
: Method to fill in missing values (‘ffill’, ‘bfill’, etc.).fill_value
: Value to use for missing values introduced by the reindexing.limit
: Maximum number of consecutive missing values to fill.tolerance
: Maximum distance between original and new label for filling.level
: Match simple Index levels with MultiIndex levels.copy
: Copy underlying data if new labels are equivalent to old labels.
Implementing Reindexing in Pandas
Creating a Sample DataFrame
Let’s begin with a DataFrame representing different individuals and their attributes.
import pandas as pd
# Your original data and index : Learning @ Freshers.in
data = {'Age': [32, 29, 35, 40, 28, 33],
'City': ['Mumbai', 'Bangalore', 'Chennai', 'Delhi', 'New York', 'San Francisco']}
index = ['Sachin', 'Manju', 'Ram', 'Raju', 'David', 'Wilson']
df = pd.DataFrame(data, index=index)
print(df)
Age City
Sachin 32 Mumbai
Manju 29 Bangalore
Ram 35 Chennai
Raju 40 Delhi
David 28 New York
Wilson 33 San Francisco
Applying Reindexing
Now, we’ll reindex the DataFrame to introduce new individuals, reorder existing ones, and see how missing values are handled.
# New index
new_index = ['Manju', 'Ram', 'Sachin', 'Raju', 'David', 'Wilson', 'Michael']
# Reindexing
df_reindexed = df.reindex(new_index)
# Handling missing values
df_reindexed['Age'].fillna(0, inplace=True)
df_reindexed['City'].fillna('Unknown', inplace=True)
# Display the updated DataFrame
print(df_reindexed)
Age City
Manju 29.0 Bangalore
Ram 35.0 Chennai
Sachin 32.0 Mumbai
Raju 40.0 Delhi
David 28.0 New York
Wilson 33.0 San Francisco
Michael 0.0 Unknown
In this example, ‘Michael’ is a new individual not present in the original DataFrame, and we’ve provided default values for missing data.
Benefits of Reindexing
- Flexibility: Easily modify the DataFrame structure without losing data integrity.
- Data Alignment: Align different DataFrames to have a common set of indices.
- Missing Data Handling: Control over the treatment of missing values during reindexing.