Hive : How can we return the average of non-NULL records in Hive ?

Hive @ Freshers.in

The function you’re need to refer in Apache Hive is the avg() function. It is an aggregate function that returns the average of the values in a group. If all records inside a group are NULL, the function indeed returns NULL.

DDL for Table Creation:

The first step is to create a table where we will insert our data. In this example, let’s create a simple table named sales which will have two columns, salesperson_id and sales_amount.

Here is the Data Definition Language (DDL) statement for that:

CREATE TABLE freshers_in_sales (
  salesperson_id INT, 
  sales_amount INT
);
  1. Inserting Data:

After creating the table, we can insert some data into it.

INSERT INTO TABLE freshers_in_sales (salesperson_id, sales_amount) VALUES (1, 100);
INSERT INTO TABLE freshers_in_sales (salesperson_id, sales_amount) VALUES (1, 200);
INSERT INTO TABLE freshers_in_sales (salesperson_id, sales_amount) VALUES (2, 150);
INSERT INTO TABLE freshers_in_sales (salesperson_id, sales_amount) VALUES (2, NULL);
INSERT INTO TABLE freshers_in_sales (salesperson_id, sales_amount) VALUES (3, NULL);
  1. Querying Data:

Now that we have our data, let’s run an AVG() query to get the average sales_amount for each salesperson.

SELECT salesperson_id, avg(sales_amount)
FROM freshers_in_sales
GROUP BY salesperson_id;

This will return the average of sales_amount for each salesperson_id. For salesperson_id = 2, despite one of the records being NULL, the avg() function will calculate the average based only on non-NULL records. So, the average will be 150.

For salesperson_id = 3, all records are NULL, so the function returns NULL.

Please note that all above queries should be executed in Hive Shell or any other interface that you use to execute Hive queries. 

Hive important pages to refer

  1. Hive
  2. Hive Interview Questions
  3. Hive Official Page
  4. Spark Examples
  5. PySpark Blogs
  6. Bigdata Blogs
  7. Spark Interview Questions
  8. Spark Official Page
Author: user

Leave a Reply