Creating DataFrames from dictionaries of series offers flexibility and efficiency, especially when dealing with complex data structures. It allows for the combination of series with potentially different lengths and data types into a coherent tabular structure.
Guide to Creating DataFrame from Dictionary of Series
Preparing the Data
First, let’s create some series with real data. We’ll use names as indices and other information as data.
Example Series:
import pandas as pd
# Creating series learning @ Freshers.in
sachin_series = pd.Series(data={'Age': 32, 'City': 'Mumbai', 'Occupation': 'Engineer'})
manju_series = pd.Series(data={'Age': 29, 'City': 'Bangalore'})
ram_series = pd.Series(data={'Age': 35, 'City': 'Chennai', 'Occupation': 'Doctor', 'Salary': 150000})
raju_series = pd.Series(data={'Age': 40, 'City': 'Delhi'})
david_series = pd.Series(data={'Age': 28, 'City': 'New York', 'Salary': 85000})
wilson_series = pd.Series(data={'Age': 33, 'City': 'San Francisco', 'Occupation': 'Architect'})
Creating the DataFrame
Next, we create a dictionary of these series and use it to form a DataFrame.
Example of Creating DataFrame:
# Dictionary of series
data_dict = {'Sachin': sachin_series, 'Manju': manju_series, 'Ram': ram_series,
'Raju': raju_series, 'David': david_series, 'Wilson': wilson_series}
# Creating DataFrame
df = pd.DataFrame(data_dict)
# Transposing to get names as rows
df = df.T
df
Output
Age City Occupation Salary
Sachin 32 Mumbai Engineer NaN
Manju 29 Bangalore NaN NaN
Ram 35 Chennai Doctor 150000
Raju 40 Delhi NaN NaN
David 28 New York NaN 85000
Wilson 33 San Francisco Architect NaN
Understanding the Resulting DataFrame
The DataFrame df
will have names as row indices and the keys of series (like ‘Age’, ‘City’, etc.) as columns. Missing values are automatically handled and represented as NaN.
Benefits of This Approach
- Flexibility in Data Structure: Handles series with different lengths and missing values gracefully.
- Ease of Manipulation: Easy to add or remove data, making it highly dynamic.
- Simplifies Complex Data Aggregation: Ideal for combining series representing different aspects of data into a single structure.