Working with data frames in Python’s Pandas library often involves selecting and manipulating multiple columns. This article explains how to effectively return multiple columns, a fundamental skill for data analysis and manipulation in Pandas.
Techniques for Returning Multiple Columns
Using column names
The simplest method to select multiple columns is by using their names within double brackets. This returns a new DataFrame with just the selected columns.
Example:
Let’s create a DataFrame with names and ages for our demonstration:
import pandas as pd
# Sample data
data = {
'Name': ['Sachin', 'Manju', 'Ram', 'Raju', 'David', 'Wilson'],
'Age': [30, 25, 40, 35, 28, 32],
'City': ['Delhi', 'Mumbai', 'Chennai', 'Kolkata', 'Bangalore', 'Hyderabad']
}
df = pd.DataFrame(data)
# Selecting multiple columns
selected_columns = df[['Name', 'Age']]
print(selected_columns)
Output
Name Age
0 Sachin 30
1 Manju 25
2 Ram 40
3 Raju 35
4 David 28
5 Wilson 32
Using loc[]
and iloc[]
The loc[]
and iloc[]
methods provide more flexibility. loc[]
is label-based, meaning you use the column names, while iloc[]
is integer index-based.
Example:
# Using loc[]
selected_columns_loc = df.loc[:, ['Name', 'City']]
# Using iloc[]
selected_columns_iloc = df.iloc[:, [0, 2]]
print(selected_columns_loc)
print(selected_columns_iloc)
Output
Name City
0 Sachin Delhi
1 Manju Mumbai
2 Ram Chennai
3 Raju Kolkata
4 David Bangalore
5 Wilson Hyderabad
Name City
0 Sachin Delhi
1 Manju Mumbai
2 Ram Chennai
3 Raju Kolkata
4 David Bangalore
5 Wilson Hyderabad
Advanced techniques
For more complex scenarios, you can use boolean indexing or query expressions to select columns based on conditions.
Boolean indexing
Example:
# Selecting people older than 30
older_than_30 = df[df['Age'] > 30][['Name', 'Age']]
print(older_than_30)
Output
Name Age
2 Ram 40
3 Raju 35
5 Wilson 32
Using query()
Example:
# Using query to select specific names
specific_names = df.query("Name in ['Sachin', 'David']")[['Name', 'City']]
print(specific_names)
Output
Name City
0 Sachin Delhi
4 David Bangalore
Returning multiple columns in Pandas is a versatile operation that can be achieved through various methods, depending on the complexity and requirements of your task. Whether through direct column name selection, index-based methods like loc[] and iloc[], or more advanced techniques like boolean indexing and queries, Pandas provides robust functionality for effective data manipulation.