In the data world, it’s not uncommon to encounter scenarios where certain records might have missing date values. In BigQuery, there’s a way to handle this elegantly using window functions. Let’s explore how.
Setup
Let’s assume we’re working with a table named freshers_in_page_views that tracks daily page view counts.
1. DDL: Create the Table
CREATE TABLE freshers_in_page_views (
view_date DATE,
view_count INT64
);
2. Insert Sample Data
Here’s some sample data with missing view_date values:
INSERT INTO freshers_in_page_views (view_date, view_count) VALUES
('2023-01-01', 100),
(NULL, 120),
('2023-01-03', 140),
(NULL, 110),
('2023-01-05', 90);
In the above data, the view_date for some rows is missing (NULL).
3. Select Statement to Fill Missing Dates
To fill the missing view_date with the date from the previous row, we’ll use the LAG() window function:
WITH FilledDates AS (
SELECT
view_date,
view_count,
LAG(view_date) OVER (ORDER BY IFNULL(view_date, '9999-12-31')) AS previous_date
FROM freshers_in_page_views
)
SELECT
CASE
WHEN view_date IS NULL THEN previous_date
ELSE view_date
END AS adjusted_view_date,
view_count
FROM FilledDates
ORDER BY adjusted_view_date;
We’re using a common table expression (WITH clause) named FilledDates to determine the previous date using LAG().
We’re then using a CASE statement to replace NULL values with the date from the previous row.
The ORDER BY IFNULL(view_date, ‘9999-12-31’) ensures that the ordering is based on the existing dates while placing the NULL values at the end, so the LAG function doesn’t consider them.