Pandas API on Spark
Input/Output
- Data Generator
- Spark Metastore Table
- Delta Lake
- Parquet : Pandas API on Spark
- ORC
- Generic Spark I/O
- Flat File / CSV
- Clipboard
- Excel
- JSON
- HTML
- SQL
- General functions
- Working with options
- Data manipulations and SQL
- Unpivot a DataFrame from wide format to long format : melt
- Merging DataFrame objects with a database-style join operation : merge
- Unraveling the ‘merge_asof’ Function : asof merge between two DataFrames
- get_dummies : Convert categorical variable into dummy/indicator variables
- Concatenate Pandas-on-Spark objects effortlessly
- Execute SQL queries seamlessly on Spark DataFrames using the Pandas API
- Optimize Spark DataFrame joins by leveraging the broadcast functionality with Pandas API
- Top-level missing data
- Missing Value Detection with Pandas API on Spark : isna()
- Detect missing values in Spark DataFrames using the Pandas API : isnull()
- Detect existing (non-missing) values in Spark DataFrames using Pandas API : notna()
- Detect existing (non-missing) values in Spark DataFrames using Pandas API : notnull()
- Top-level dealing with numeric data
- Top-level dealing with datetimelike data
- Series
- Creation of data series with customizable parameters : Series
- Unraveling pivotal role in managing axis labels Series.index
- How Spark facilitates data type management : Series.dtype
- Data types within Spark Series objects : Series.dtypes
- Getting int representing the number of array dimensions : Series.ndim
- How to reveal the underlying data’s dimensions Series.shape
- How to get the number of elements within an object : Series.size
- Determining whether the current object holds any data : Series.empty
- Transposition of data Series.T
- Detect the presence of missing values within a Series Series.hasnans
- Return a Numpy representation of the DataFrame Series.values
- Conversion
- Indexing, iteration : Pandas API on Spark
- Binary operator functions
- Series.add(other[, fill_value])
- Series.div(other)
- Series.mul(other)
- Series.radd(other[, fill_value])
- Series.rdiv(other)
- Series.rmul(other)
- Series.rsub(other)
- Series.rtruediv(other)
- Series.sub(other)
- Series.truediv(other)
- Series.pow(other)
- Series.rpow(other)
- Series.mod(other)
- Series.rmod(other)
- Series.floordiv(other)
- Series.rfloordiv(other)
- Series.divmod(other)
- Series.rdivmod(other)
- Series.combine_first(other)
- Series.lt
- Series.gt
- Series.le
- Series.ge
- Series.ne
- Series.eq
- Series.product
- Series.dot