Pandas API on Spark

Spark_Pandas_Freshers_in

Pandas API on Spark

Input/Output

  1. Data Generator
  2. Spark Metastore Table
  3. Delta Lake
  4. Parquet : Pandas API on Spark
    1. Input/Output with Parquet Files
    2. Pandas API on Spark: Writing DataFrames to Parquet Files : to_parquet
  5. ORC
    1. Exploring Pandas API on Spark: Load an ORC object from the file path : read_orc
    2. Writing DataFrames to ORC Format with Pandas API on Spark : to_orc
  6. Generic Spark I/O
    1. Loading DataFrames from Spark Data Sources with Pandas API : read_spark_io
    2. Pandas API on Spark for Efficient Output Operations : to_spark_io
  7. Flat File / CSV
    1. Pandas API on Spark for CSV Input : read_csv
    2. Pandas API on Spark for CSV Output Operations : to_csv
  8. Clipboard
    1. Spark’s Clipboard Integration : read_clipboard
    2. Spark’s DataFrame.to_clipboard Function
  9. Excel
    1. Leveraging Pandas API on Spark to Read Excel Files : read_excel
    2. Spark’s DataFrame.to_excel Function
  10. JSON
    1. Spark for JSON to DataFrame Conversion : read_json()
    2. Spark for JSON Conversion : to_json
  11. HTML
    1. Spark for HTML Table Extraction
    2. Spark DataFrame to HTML Tables with Pandas API : to_html()
  12. SQL
    1. Spark for Reading SQL Database Tables : read_sql_table()
    2. SQL query execution into DataFrames : read_sql_query()
    3. Read SQL queries or database tables into DataFrames : read_sql()
  13. General functions
    1. Working with options
      1. Managing Options with reset_option()
      2. Harnessing get_option() for Fine-Tuning
      3. Mastering set_option() for Enhanced Workflows
      4. Exploring option_context()
    2. Data manipulations and SQL
      1. Unpivot a DataFrame from wide format to long format : melt
      2. Merging DataFrame objects with a database-style join operation : merge
      3. Unraveling the ‘merge_asof’ Function : asof merge between two DataFrames
      4. get_dummies : Convert categorical variable into dummy/indicator variables
      5. Concatenate Pandas-on-Spark objects effortlessly
      6. Execute SQL queries seamlessly on Spark DataFrames using the Pandas API
      7. Optimize Spark DataFrame joins by leveraging the broadcast functionality with Pandas API
    3. Top-level missing data
      1. Missing Value Detection with Pandas API on Spark : isna()
      2. Detect missing values in Spark DataFrames using the Pandas API : isnull()
      3. Detect existing (non-missing) values in Spark DataFrames using Pandas API : notna()
      4. Detect existing (non-missing) values in Spark DataFrames using Pandas API : notnull()
    4. Top-level dealing with numeric data
      1. Converting arguments to numeric types
    5. Top-level dealing with datetimelike data
      1. Pandas API on Spark to convert data to datetime format
      2. How to generates a fixed frequency DatetimeIndex : date_range()
      3. Converting argument into a timedelta object
      4. Generate fixed frequency TimedeltaIndex
    6. Series
      1. Creation of data series with customizable parameters : Series
      2. Unraveling pivotal role in managing axis labels Series.index
      3. How Spark facilitates data type management : Series.dtype
      4. Data types within Spark Series objects : Series.dtypes
      5. Getting int representing the number of array dimensions : Series.ndim
      6. How to reveal the underlying data’s dimensions Series.shape
      7. How to get the number of elements within an object : Series.size
      8. Determining whether the current object holds any data : Series.empty
      9. Transposition of data Series.T
      10. Detect the presence of missing values within a Series Series.hasnans
      11. Return a Numpy representation of the DataFrame Series.values
    7. Conversion
      1. Casting the data type of a series to a specified type Series.astype
      2. PySpark : Series.copy() and Series.bool()
    8. Indexing, iteration : Pandas API on Spark
      1. Series.at
      2. Series.iat
      3. Series.loc
      4. Series.iloc
      5. Series.keys()
      6. Series.pop(item)
      7. Series.items()
      8. Series.iteritems()
      9. Series.item()
      10. Series.xs(key[, level])
      11. Series.get(key[, default])
    9. Binary operator functions
      1. Series.add(other[, fill_value])
      2. Series.div(other)
      3. Series.mul(other)
      4. Series.radd(other[, fill_value])
      5. Series.rdiv(other)
      6. Series.rmul(other)
      7. Series.rsub(other)
      8. Series.rtruediv(other)
      9. Series.sub(other)
      10. Series.truediv(other)
      11. Series.pow(other)
      12. Series.rpow(other)
      13. Series.mod(other)
      14. Series.rmod(other)
      15. Series.floordiv(other)
      16. Series.rfloordiv(other)
      17. Series.divmod(other)
      18. Series.rdivmod(other)
      19. Series.combine_first(other)
      20. Series.lt
      21. Series.gt
      22. Series.le
      23. Series.ge
      24. Series.ne
      25. Series.eq
      26. Series.product
      27. Series.dot
Author: user