There are several ways to get the distribution of a column in Python, depending on the type of data and the desired output. Here are a few common methods:
- Using the
pandas
library: If the data is in a DataFrame, thepandas
library offers several methods to quickly get the distribution of a column, such asvalue_counts()
,describe()
, andhist()
. For example:
import pandas as pd
df = pd.read_csv('freshers_data.csv')
# To get the frequency count of each unique value in the column 'column_name'
df['column_name'].value_counts()
# To get summary statistics of the column
df['column_name'].describe()
# To plot a histogram of the column
df['column_name'].hist()
- Using the
numpy
library: If the data is in a numpy array, thenumpy
library offers several methods to get the distribution of a column, such asunique()
,count_nonzero()
,mean()
,std()
. For example:
import numpy as np
data = np.genfromtxt('freshers_data.csv', delimiter=',')
# To get the unique values in the column
np.unique(data[:, column_index])
# To get the count of each unique value in the column
(values,counts) = np.unique(data[:, column_index], return_counts=True)
# To get the mean of the column
np.mean(data[:, column_index])
# To get the standard deviation of the column
np.std(data[:, column_index])
- Using the
matplotlib
library: Thematplotlib
library can be used to plot a histogram of the column. For example:
import matplotlib.pyplot as plt
plt.hist(data[:, column_index])
plt.show()
- Using the
seaborn
library: Theseaborn
library can be used to plot the distribution of the column. For example:
import seaborn as sns
sns.distplot(data[:, column_index])
plt.show()