Tag: big_data_interview

Hive : How to update the access time of a file or directory in the Hive data warehouse [Touch]

user August 1, 2023 0 Comments

Among the many functions Hive provides, one essential operation is “TOUCH.” In this article, we will explore the purpose of…

PySpark : Identifying Data Skewness and Partition Row Counts in PySpark

user July 28, 2023 0 Comments

Data skewness is a common issue in large scale data processing. It happens when data is not evenly distributed across…

Hive : Understanding Array Aggregation in Apache Hive

user July 27, 2023 0 Comments

Apache Hive offers many inbuilt functions to process data, among which collect_list() and collect_set() are commonly used to perform array aggregation….

Hive : Creating and Utilizing 64-bit Hash Values in Apache Hive

user July 27, 2023 0 Comments

Apache Hive provides several inbuilt functions to process the data. One of these is the hash() function, which calculates a…

Hive : How can we return the average of non-NULL records in Hive ?

user July 27, 2023 0 Comments

The function you’re need to refer in Apache Hive is the avg() function. It is an aggregate function that returns…

Hive : How to Delete Old Apache Hive Logs , increase space and boosting Cluster Performance

user July 26, 2023 0 Comments

Apache Hive logs are a critical component for debugging and performance optimization. However, over time, these logs can occupy significant…

Hive : How to Kill a Running Query in Apache Hive

user July 26, 2023 0 Comments

There may be times when a running query needs to be terminated due to excessive resource usage, incorrect syntax, or…

Hive : Seeing Long Running Queries in Apache Hive

user July 26, 2023 0 Comments

Apache Hive is a data warehouse software project built on top of Apache Hadoop that provides data query and analysis….

PySpark : from_utc_timestamp Function: A Detailed Guide

user July 21, 2023 0 Comments

The from_utc_timestamp function in PySpark is a highly useful function that allows users to convert UTC time to a specified…

PySpark : Fixing ‘TypeError: an integer is required (got type bytes)’ Error in PySpark with Spark 2.4.4

user July 21, 2023 0 Comments

Apache Spark is an open-source distributed general-purpose cluster-computing framework. PySpark is the Python library for Spark, and it provides an…

Tag: big_data_interview

Hive : How to update the access time of a file or directory in the Hive data warehouse [Touch]

PySpark : Identifying Data Skewness and Partition Row Counts in PySpark

Hive : Understanding Array Aggregation in Apache Hive

Hive : Creating and Utilizing 64-bit Hash Values in Apache Hive

Hive : How can we return the average of non-NULL records in Hive ?

Hive : How to Delete Old Apache Hive Logs , increase space and boosting Cluster Performance

Hive : How to Kill a Running Query in Apache Hive

Hive : Seeing Long Running Queries in Apache Hive

PySpark : from_utc_timestamp Function: A Detailed Guide

PySpark : Fixing ‘TypeError: an integer is required (got type bytes)’ Error in PySpark with Spark 2.4.4

Trending

Recent Posts

Featured Posts – Slider Widget

How PARTITION BY Works in Snowflake, and SQL in general

Stash a specific file using Git

Prevent your computer from locking : Python to simulate mouse movements

AWS EC2 vs Azure Virtual Machines

Production and Industrial Engineering

Engineering Technical campus placement question and answers

JavaScript’s reduceRight() method to iterate over an array from right to left

Merging Multiple Images into a Single PDF File Using Python

Nanotechnology

Electronics and Instrumentation

Most Viewed Posts