Hive : Analyzing Data with Hive CUBE: A Comprehensive Guide

Hive @ Freshers.in

In this article, we will focus on creating a table and utilize the CUBE operator in Hive. This is an excellent guide for freshers who are looking to understand and explore the capabilities of Hive.

1. DDL (Data Definition Language): Creating a Table

The first step is to create a table where we can insert data. Here, we are going to create a table named freshers_in_sales to store information about sales transactions.

CREATE TABLE freshers_in_sales (
    product_id INT,
    product_name STRING,
    category STRING,
    sales_amount DOUBLE,
    transaction_date DATE
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

2. Inserting Data

Once the table is created, you can insert data into it. Here’s an example of how you can insert data into the freshers_in_sales table.

INSERT INTO freshers_in_sales (product_id, product_name, category, sales_amount, transaction_date) VALUES
    (101, 'Smartphone', 'Electronics', 299.99, '2023-08-12'),
    (102, 'Laptop', 'Electronics', 799.99, '2023-08-10'),
    (103, 'Coffee Maker', 'Home Appliances', 49.99, '2023-08-11');

3. Using the CUBE Operator

The CUBE operator is used to perform a grouping on multiple columns that allows us to create subtotals for a variety of combinations. Here’s an example query using the CUBE operator with the freshers_in_sales table to analyze sales data.

SELECT category, product_name, SUM(sales_amount)
FROM freshers_in_sales
GROUP BY category, product_name WITH CUBE;

This query will generate subtotals for every possible combination of category and product_name, giving you a powerful insight into your sales data. The CUBE operator in Hive offers a highly efficient way to analyze multidimensional data.

+----------------+----------------+-------------------+
|   category     |  product_name  | sum(sales_amount) |
+----------------+----------------+-------------------+
| Electronics    | Smartphone     |            299.99 |
| Electronics    | Laptop         |            799.99 |
| Home Appliances| Coffee Maker   |             49.99 |
| Electronics    | NULL           |           1099.98 |
| Home Appliances| NULL           |             49.99 |
| NULL           | Smartphone     |            299.99 |
| NULL           | Laptop         |            799.99 |
| NULL           | Coffee Maker   |             49.99 |
| NULL           | NULL           |           1149.97 |
+----------------+----------------+-------------------+

This result includes subtotals for every combination of category and product_name. In the context of CUBE, the NULL values represent the subtotal for the respective column, offering different levels of aggregation.

Hive important pages to refer

  1. Hive
  2. Hive Interview Questions
  3. Hive Official Page
  4. Spark Examples
  5. PySpark Blogs
  6. Bigdata Blogs
  7. Spark Interview Questions
  8. Spark Official Page
Author: user

Leave a Reply