Tag: Big Data
Hive : Understanding Array Aggregation in Apache Hive
Apache Hive offers many inbuilt functions to process data, among which collect_list() and collect_set() are commonly used to perform array aggregation….
Hive : Creating and Utilizing 64-bit Hash Values in Apache Hive
Apache Hive provides several inbuilt functions to process the data. One of these is the hash() function, which calculates a…
Hive : How can we return the average of non-NULL records in Hive ?
The function you’re need to refer in Apache Hive is the avg() function. It is an aggregate function that returns…
Hive : How to Delete Old Apache Hive Logs , increase space and boosting Cluster Performance
Apache Hive logs are a critical component for debugging and performance optimization. However, over time, these logs can occupy significant…
Hive : How to Kill a Running Query in Apache Hive
There may be times when a running query needs to be terminated due to excessive resource usage, incorrect syntax, or…
Hive : Seeing Long Running Queries in Apache Hive
Apache Hive is a data warehouse software project built on top of Apache Hadoop that provides data query and analysis….
PySpark : from_utc_timestamp Function: A Detailed Guide
The from_utc_timestamp function in PySpark is a highly useful function that allows users to convert UTC time to a specified…
PySpark : Fixing ‘TypeError: an integer is required (got type bytes)’ Error in PySpark with Spark 2.4.4
Apache Spark is an open-source distributed general-purpose cluster-computing framework. PySpark is the Python library for Spark, and it provides an…
PySpark : Converting Decimal to Integer in PySpark: A Detailed Guide
One of PySpark’s capabilities is the conversion of decimal values to integers. This conversion is beneficial when you need to…
PySpark : A Comprehensive Guide to Converting Expressions to Fixed-Point Numbers in PySpark
Among PySpark’s numerous features, one that stands out is its ability to convert input expressions into fixed-point numbers. This feature…