This article delves into different techniques to loop over DataFrame rows, their use cases, and some best practices.
Sample dataframe:
import pandas as pd
# Sample data: Age and Score for Sachin, Ram, Abhilash, Mike, and Elaine
df = pd.DataFrame({
'Name': ['Sachin', 'Ram', 'Abhilash', 'Mike', 'Elaine'],
'Age': [25, 30, 29, 24, 27],
'Score': [85, 88, 76, 90, 82]
})
print(df)
Iterating over rows:
Using iterrows():
The iterrows()
method returns an iterator yielding index and row data as a series.
for index, row in df.iterrows():
print(f"Name: {row['Name']}, Age: {row['Age']}, Score: {row['Score']}")
While iterrows() is a popular method, it’s not always the fastest, especially for large DataFrames.
Using itertuples():
The itertuples() method returns an iterator that yields namedtuples of the rows. It’s faster than iterrows().
for row in df.itertuples(index=False):
print(f"Name: {row.Name}, Age: {row.Age}, Score: {row.Score}")
Using apply() with a lambda function:
You can use the apply() function along the axis of rows (axis=1) to iterate and apply a lambda function.
df.apply(lambda row: print(f"Name: {row['Name']}, Age: {row['Age']}, Score: {row['Score']}"), axis=1)
Best practices and recommendations:
Avoid Iteration When Possible: Always look for vectorized alternatives before resorting to iteration for performance reasons.
Choose itertuples()
for Speed: If you have to iterate, itertuples()
is generally faster than iterrows()
.
Limitations of apply()
: While using apply()
with a lambda function can be handy, it might not be as intuitive or efficient as other methods for simple row-wise operations.