One frequent operation when working with DataFrames is determining if a specific column exists. This article guides you through multiple methods to achieve this.
Creating a sample DataFrame for demonstration:
import pandas as pd
data = {
'Name': ['Sachin', 'Ramu', 'Arun'],
'Age': [25, 30, 35],
'City': ['New York', 'San Francisco', 'Los Angeles']
}
df = pd.DataFrame(data)
print(df)
Output
Name Age City
0 Sachin 25 New York
1 Ramu 30 San Francisco
2 Arun 35 Los Angeles
Checking if a Column Exists
Method 1: Using the in keyword
The simplest way to check if a column exists is by using the in
keyword:
column_to_check = 'Age'
if column_to_check in df.columns:
print(f"'{column_to_check}' exists in DataFrame.")
else:
print(f"'{column_to_check}' does not exist in DataFrame.")
Method 2: Using df.columns.contains()
This method is particularly useful for DataFrames with a MultiIndex.
if df.columns.contains(column_to_check):
print(f"'{column_to_check}' exists in DataFrame.")
else:
print(f"'{column_to_check}' does not exist in DataFrame.")
Method 3: Using df.hasnans
The hasnans attribute checks if a Series (column) contains NaNs. When applied to columns, it can serve as an indirect check for a column’s existence.
try:
if df[column_to_check].hasnans:
print(f"'{column_to_check}' exists in DataFrame.")
else:
print(f"'{column_to_check}' exists in DataFrame.")
except KeyError:
print(f"'{column_to_check}' does not exist in DataFrame.")
While this method is more unconventional and primarily used for other purposes, it’s an alternative way to approach the problem.
Handling a non-existent column
When you try accessing a non-existent column directly, Pandas will raise a KeyError
. Thus, it’s crucial to check if a column exists before performing operations on it. This ensures your code’s robustness, especially when dealing with dynamic or evolving datasets.
Refer more on python here : Python
Refer more on python here : PySpark