Tag: Hive
Hive : How to Delete Old Apache Hive Logs , increase space and boosting Cluster Performance
Apache Hive logs are a critical component for debugging and performance optimization. However, over time, these logs can occupy significant…
Hive : How to Kill a Running Query in Apache Hive
There may be times when a running query needs to be terminated due to excessive resource usage, incorrect syntax, or…
Hive : Seeing Long Running Queries in Apache Hive
Apache Hive is a data warehouse software project built on top of Apache Hadoop that provides data query and analysis….
Hive : How to drop duplicate rows from Hive table.
This is a work around to show how can we drop duplicate rows from Hive table. Here is how to…
Hive : How to preserve Hive metadata [Preserve the last DDL time for the table]
HOLD_DDLTIME The “last DDL time” refers to the timestamp of the most recent DDL (Data Definition Language) operation that was…
Hive : Different types of file formats supported by Hive
Apache Hive supports a variety of file formats to store and process data. These file formats can be categorized into…
Hive : Exploring Different Types of User-Defined Functions (UDFs) in Hive
In addition to its built-in functions, Hive also supports User-Defined Functions (UDFs), which enable users to extend Hive’s functionality by…
Hive : Understanding the MAPJOIN Operator in Hive with an Example
When dealing with large datasets, optimizing join operations is crucial to improving query performance. One of the techniques to achieve…
Hive : Understanding the DISTRIBUTE BY Operator in Hive with an Example
One of the key features of Hive is its ability to optimize queries for improved performance. The DISTRIBUTE BY operator…
Sort Merge Bucket Join in Hive: A Comprehensive Guide
Sort Merge Bucket (SMB) join is an optimization technique in Apache Hive that helps improve the performance of join operations….