How to implement Outer Joins in Pandas.

user October 27, 2023

Join operations are fundamental to data manipulation and analysis. Among these, the outer join stands out for its ability to merge data based on shared columns, while retaining unmatched values from both DataFrames. In this guide, we’ll dive deep into the concept of outer joins in Pandas using comprehensive examples.

To understand the outer join operation, let’s consider two sample DataFrames. One DataFrame contains the scores of students in Mathematics, and the other contains scores in English.

import pandas as pd
# Sample data for students' scores in Mathematics
math_scores = pd.DataFrame({
    'Student_ID': [101, 102, 103, 104],
    'Math_Score': [85, 92, 78, 88]
})
# Sample data for students' scores in English
english_scores = pd.DataFrame({
    'Student_ID': [102, 103, 104, 105],
    'English_Score': [87, 81, 92, 78]
})
print(math_scores)
print(english_scores)

Performing an Outer Join:

An outer join merges both DataFrames, keeping all records from both frames and placing NaNs where data is absent in one of them.

merged_data = pd.merge(math_scores, english_scores, on='Student_ID', how='outer')
print(merged_data)

The output represents a combination of both DataFrames. Students with ID 101 and 105 have NaN values because they are missing in one of the original frames.

Output

   Student_ID  Math_Score  English_Score
0         101        85.0            NaN
1         102        92.0           87.0
2         103        78.0           81.0
3         104        88.0           92.0
4         105         NaN           78.0

The outer join operation provides a comprehensive view of students’ scores in both subjects. Notice that:

Student 101 has a score in Mathematics but not in English, resulting in a NaN for English_Score.

Conversely, Student 105 has an English score but not a Mathematics score, resulting in a NaN for Math_Score.

All other students have scores recorded in both subjects.

Refer more on python here : Python