Efficiently iterating over rows in a Pandas dataframe: Methods and examples

user October 27, 2023

This article delves into different techniques to loop over DataFrame rows, their use cases, and some best practices.

Sample dataframe:

import pandas as pd
# Sample data: Age and Score for Sachin, Ram, Abhilash, Mike, and Elaine
df = pd.DataFrame({
    'Name': ['Sachin', 'Ram', 'Abhilash', 'Mike', 'Elaine'],
    'Age': [25, 30, 29, 24, 27],
    'Score': [85, 88, 76, 90, 82]
})
print(df)

Iterating over rows:

Using iterrows():

The iterrows() method returns an iterator yielding index and row data as a series.

for index, row in df.iterrows():
    print(f"Name: {row['Name']}, Age: {row['Age']}, Score: {row['Score']}")

While iterrows() is a popular method, it’s not always the fastest, especially for large DataFrames.

Using itertuples():

The itertuples() method returns an iterator that yields namedtuples of the rows. It’s faster than iterrows().

for row in df.itertuples(index=False):
    print(f"Name: {row.Name}, Age: {row.Age}, Score: {row.Score}")

Using apply() with a lambda function:

You can use the apply() function along the axis of rows (axis=1) to iterate and apply a lambda function.

df.apply(lambda row: print(f"Name: {row['Name']}, Age: {row['Age']}, Score: {row['Score']}"), axis=1)

Best practices and recommendations:

Avoid Iteration When Possible: Always look for vectorized alternatives before resorting to iteration for performance reasons.

Choose itertuples() for Speed: If you have to iterate, itertuples() is generally faster than iterrows().

Limitations of apply(): While using apply() with a lambda function can be handy, it might not be as intuitive or efficient as other methods for simple row-wise operations.

Refer more on python here : Python