Ways to get the distribution of a column in Python, depending on the type of data

python @ Freshers.in

There are several ways to get the distribution of a column in Python, depending on the type of data and the desired output. Here are a few common methods:

  1. Using the pandas library: If the data is in a DataFrame, the pandas library offers several methods to quickly get the distribution of a column, such as value_counts(), describe(), and hist(). For example:
import pandas as pd
df = pd.read_csv('freshers_data.csv')
# To get the frequency count of each unique value in the column 'column_name'
df['column_name'].value_counts()
# To get summary statistics of the column
df['column_name'].describe()
# To plot a histogram of the column
df['column_name'].hist()
  1. Using the numpy library: If the data is in a numpy array, the numpy library offers several methods to get the distribution of a column, such as unique(), count_nonzero(), mean(), std(). For example:
import numpy as np
data = np.genfromtxt('freshers_data.csv', delimiter=',')
# To get the unique values in the column
np.unique(data[:, column_index])
# To get the count of each unique value in the column
(values,counts) = np.unique(data[:, column_index], return_counts=True)
# To get the mean of the column
np.mean(data[:, column_index])
# To get the standard deviation of the column
np.std(data[:, column_index])
  1. Using the matplotlib library: The matplotlib library can be used to plot a histogram of the column. For example:
import matplotlib.pyplot as plt
plt.hist(data[:, column_index])
plt.show()
  1. Using the seaborn library: The seaborn library can be used to plot the distribution of the column. For example:
import seaborn as sns
sns.distplot(data[:, column_index])
plt.show()
Refer more on python here :
Author: user

Leave a Reply