Merging dataFrames on multiple columns with Python Pandas

user October 26, 2023

In this article, we’ll explore how to merge two DataFrames based on multiple columns, ensuring seamless integration of datasets. Merging DataFrames on multiple columns using Pandas is a foundational skill in data analysis.

Sample data for hands-on learning

import pandas as pd
# First DataFrame
data1 = {
    'ID': [1, 2, 3, 4, 5],
    'Name': ['Sachin', 'Ram', 'Abhilash', 'Mike', 'Elaine'],
    'Age': [30, 25, 28, 40, 35]
}
df1 = pd.DataFrame(data1)

# Second DataFrame
data2 = {
    'ID': [3, 4, 5, 6, 7],
    'Name': ['Abhilash', 'Mike', 'Elaine', 'Jasmine', 'Robert'],
    'Score': [85, 90, 78, 88, 79]
}
df2 = pd.DataFrame(data2)

print(df1)
print(df2)

Output

   ID      Name  Age
0   1    Sachin   30
1   2       Ram   25
2   3  Abhilash   28
3   4      Mike   40
4   5    Elaine   35

   ID      Name  Score
0   3  Abhilash     85
1   4      Mike     90
2   5    Elaine     78
3   6   Jasmine     88
4   7    Robert     79

Merging DataFrames on multiple columns

The primary method to merge DataFrames is the merge function. When merging on multiple columns, provide the column names as a list.

merged_df = pd.merge(df1, df2, on=['ID', 'Name'], how='inner')
print(merged_df)

Output

   ID      Name  Age  Score
0   3  Abhilash   28     85
1   4      Mike   40     90
2   5    Elaine   35     78

In the above code:

on specifies the columns on which the merge operation is based.

how defines the type of merge. The ‘inner’ merge retains only rows with matching values in both DataFrames.

Understanding different types of merges

While ‘inner’ merge is common, there are other types you should be aware of:

Inner: Only rows with matching values in both DataFrames are retained.

Left: All rows from the left DataFrame and matching rows from the right DataFrame are retained. NaN for non-matching rows.

Right: All rows from the right DataFrame and matching rows from the left DataFrame are retained. NaN for non-matching rows.

Outer: All rows from both DataFrames are retained, with NaN in places where no match is found.

# Example of a left merge
left_merged_df = pd.merge(df1, df2, on=['ID', 'Name'], how='left')
print(left_merged_df)

Output

   ID      Name  Age  Score
0   1    Sachin   30    NaN
1   2       Ram   25    NaN
2   3  Abhilash   28   85.0
3   4      Mike   40   90.0
4   5    Elaine   35   78.0

Potential pitfalls and solutions

Suffixes: When columns, other than the ones you’re merging on, have the same name, Pandas appends suffixes. Use the suffixes parameter to customize these.

custom_merged_df = pd.merge(df1, df2, on=['ID', 'Name'], how='inner', suffixes=('_Left', '_Right'))
print(custom_merged_df)

Output

   ID      Name  Age  Score
0   3  Abhilash   28     85
1   4      Mike   40     90
2   5    Elaine   35     78

Mismatched data types: Ensure columns you’re merging on have consistent data types across both DataFrames.

Missing values: Rows with NaN in the merge columns will not be matched. Ensure data consistency before merging.

Refer more on python here : Python
Refer more on python here : PySpark

Post Views: 8

Author: user

Merging dataFrames on multiple columns with Python Pandas

Merging DataFrames on multiple columns

Understanding different types of merges

Potential pitfalls and solutions

Trending

Recent Posts

Featured Posts – Slider Widget

How PARTITION BY Works in Snowflake, and SQL in general

Stash a specific file using Git

Prevent your computer from locking : Python to simulate mouse movements

AWS EC2 vs Azure Virtual Machines

Production and Industrial Engineering

Engineering Technical campus placement question and answers

JavaScript’s reduceRight() method to iterate over an array from right to left

Merging Multiple Images into a Single PDF File Using Python

Nanotechnology

Electronics and Instrumentation

Most Viewed Posts

Merging DataFrames on multiple columns

Understanding different types of merges

Potential pitfalls and solutions

Related Articles

Trending

Recent Posts

Featured Posts – Slider Widget