How to split Pandas DataFrame in Python ?

Python Pandas @ Freshers.in

You can use the pd.DataFrame.iloc[] function to split a Pandas DataFrame. The iloc[] function is used to get rows or columns from a DataFrame based on their integer index position. You can use it to slice the DataFrame into smaller DataFrames.

Here is an example of how you can use the iloc[] function to split a DataFrame into two smaller DataFrames:

import pandas as pd
# Load the DataFrame
df = pd.read_csv('data.csv')
# Split the DataFrame into two smaller DataFrames
df1 = df.iloc[:len(df)//2]
df2 = df.iloc[len(df)//2:]

This will split the DataFrame df into two smaller DataFrames df1 and df2, with df1 containing the first half of the rows of df and df2 containing the second half of the rows of df.

You can also use the pd.DataFrame.loc[] function to split a DataFrame based on row labels or boolean indexing. For example:

import pandas as pd
# Load the DataFrame
df = pd.read_csv('data.csv')
# Split the DataFrame into two smaller DataFrames
df1 = df.loc[df['column_name'] == 'value']
df2 = df.loc[df['column_name'] != 'value']

This will split the DataFrame df into two smaller DataFrames df1 and df2, with df1 containing the rows of df where the value in the column_name column is ‘value’ and df2 containing the remaining rows of df.

Get more post on Python, PySpark

Spark import urls to refer

  1. Spark Examples
  2. PySpark Blogs
  3. Bigdata Blogs
  4. Spark Interview Questions
  5. Official Page
Author: user

Leave a Reply