Hive : Difference between the MapReduce execution engine and the Tez execution engine in Hive

user March 2, 2023 Leave a Comment

MapReduce and Tez are two popular execution engines used in Apache Hive for processing large-scale datasets. While both engines are used to execute queries and transformations on Hive tables, there are several differences between them. In this article, we will explore the differences between the MapReduce and Tez execution engines in Hive.

MapReduce Execution Engine

MapReduce is a batch processing framework that processes large-scale datasets in a distributed manner. In MapReduce execution engine, Hive translates queries into MapReduce jobs, which are then executed on a cluster of commodity hardware. In MapReduce, data is processed in two stages: Map and Reduce.

The Map stage processes data in parallel by dividing it into smaller chunks, called input splits. Each input split is processed independently by a map task, which applies a map function to each record in the input split. The output of the Map stage is a set of key-value pairs.

The Reduce stage processes the output of the Map stage by grouping the key-value pairs by key and applying a reduce function to each group. The output of the Reduce stage is a set of key-value pairs, which is then written to HDFS.

MapReduce execution engine in Hive is suitable for batch processing of large datasets, but it has several limitations. The main limitation is that it requires several disk I/O operations, which can slow down the processing speed. Additionally, it has a high startup time and is not suitable for interactive queries.

Tez Execution Engine

Tez is a data processing framework that is built on top of YARN, the resource manager in Hadoop. Tez allows for efficient processing of complex DAGs (Directed Acyclic Graphs) of tasks, which are created by Hive queries. In Tez execution engine, queries are translated into DAGs, which are then optimized and executed on the cluster.

Tez execution engine in Hive has several advantages over the MapReduce execution engine. Firstly, it has a low startup time and is suitable for interactive queries. Secondly, it has a more efficient data processing model that reduces the number of disk I/O operations, improving the processing speed. Lastly, Tez can handle complex DAGs of tasks, making it suitable for processing complex queries.

In Tez, tasks are executed in a more optimized way as compared to MapReduce. Tez has a more flexible and dynamic data flow execution model. In Tez, tasks can be pipelined and data can be streamed between tasks. This allows for faster execution of queries as compared to MapReduce.

Comparison between MapReduce and Tez

MapReduce	Tez
Suitable for batch processing of large datasets	Suitable for interactive queries and complex DAGs
High startup time	Low startup time
High disk I/O operations	Low disk I/O operations
Processing speed is slower as compared to Tez	Processing speed is faster as compared to MapReduce
Less flexible data processing model	More flexible data processing model
Not suitable for complex DAGs	Suitable for complex DAGs

Both MapReduce and Tez execution engines have their strengths and weaknesses. While MapReduce is suitable for batch processing of large datasets, Tez is suitable for interactive queries and complex DAGs. Additionally, Tez has a more efficient data processing model and faster processing speed as compared to MapReduce.

Post Views: 29

Author: user

Hive : Difference between the MapReduce execution engine and the Tez execution engine in Hive

MapReduce Execution Engine

Tez Execution Engine

Comparison between MapReduce and Tez

Leave a Reply Cancel reply

Trending

Recent Posts

Featured Posts – Slider Widget

Chemical Engineering

Civil Engineering

Backpressure in AWS Kinesis Streams: Optimizing Data Processing

Troubleshooting Data Ingestion and Processing Issues with AWS Kinesis Streams

Impact of Shard Count Modification on AWS Kinesis Streams

How to map values of a Series according to an input correspondence:SSeries.map()

Understanding Series.transform(func[, axis])

Series.aggregate(func) : Pandas API on Spark

Series.agg(func) : Pandas API on Spark

Security Features of Snowflake

Most Viewed Posts

MapReduce Execution Engine

Tez Execution Engine

Comparison between MapReduce and Tez

Related Posts

Related Articles

Leave a Reply Cancel reply

Trending

Recent Posts

Featured Posts – Slider Widget