Handling null values in Google Bigquery: Best practices for data cleaning

Google Big Query @ Freshers.in

1. Overview of NULL in Databases

In the realm of databases, the term “NULL” represents the absence of any specific value. Unlike zeros or blanks, which are definitive values, NULL indicates that the data is unknown or missing. This distinction is vital because the absence of data can influence the results of calculations, filtering, and analysis.

For instance, if you’re calculating the average page traffic based on a table, the rows with NULL values can distort the result, leading to incorrect insights. Hence, understanding and appropriately managing NULL values is essential for data integrity and accurate reporting.

2. Functions to Identify NULL values: IS NULL and IS NOT NULL

Google BigQuery, like other SQL-based systems, provides functions to identify these elusive NULL values.

CREATE TABLE freshers_in_page_traffic (
    id INT64,
    page_name STRING,
    visit_count INT64,
    user_count INT64
);
INSERT INTO freshers_in_page_traffic (id, page_name, visit_count, user_count)
VALUES
(1, 'Home', 150, 100),
(2, 'About', NULL, 50),
(3, 'Contact', 80, NULL),
(4, 'Services', NULL, NULL);
Identifying NULL Values:

To find out which rows have NULL values for visit_count:

SELECT * FROM freshers_in_page_traffic
WHERE visit_count IS NULL;

For rows where user_count is NOT NULL:

SELECT * FROM freshers_in_page_traffic
WHERE user_count IS NOT NULL;

3. Techniques to Replace or Manage NULL Values Using COALESCE and IFNULL

At times, instead of discarding or ignoring NULL values, it might be beneficial to replace them with a default or calculated value.

Using COALESCE:

The COALESCE function returns the first non-NULL value in a list.

To replace NULL visit_count with 0:

SELECT id, page_name, COALESCE(visit_count, 0) as visit_count, user_count
FROM freshers_in_page_traffic;
Using IFNULL:

This function checks if a value is NULL and if so, replaces it with a given value.

To replace NULL user_count with 0:

SELECT id, page_name, visit_count, IFNULL(user_count, 0) as user_count
FROM freshers_in_page_traffic;

BigQuery import urls to refer

Author: user

Leave a Reply