In Google BigQuery, an array is a homogenous collection of elements, allowing for more complex data storage and operations. An array can contain elements of integer, string, or even other arrays and structures. Handling arrays properly can result in more efficient queries and enable you to work with multi-valued attributes seamlessly. This article dives into storing, querying, and unnesting arrays in Google BigQuery, focusing on tracking the most visited pages on a website. Arrays in Google BigQuery offer a powerful way to handle multi-valued attributes and complex data types. The platform provides various ways to query and manipulate these arrays, including the essential UNNEST function, making your data operations more flexible and efficient.
How to Store Data in Arrays
Storing data in arrays can make your database schema cleaner and can help group related values together.
CREATE TABLE freshers_in_most_visited_pages (
id INT64,
page_name STRING,
visitor_ids ARRAY<INT64>
);
INSERT INTO freshers_in_most_visited_pages (id, page_name, visitor_ids)
VALUES
(1, 'Home', [1, 2, 3]),
(2, 'About', [2, 4, 5]),
(3, 'Contact', [1, 3, 6]),
(4, 'Services', [4, 5, 6]);
Querying Data Within Arrays
You can run queries to search for specific elements within an array or to perform operations on array data.
Example 1: Finding a Specific Element
To find pages visited by a specific user with visitor_id 1:
SELECT page_name
FROM freshers_in_most_visited_pages,
UNNEST(visitor_ids) AS individual_visitor_id
WHERE individual_visitor_id = 1;
Counting Array Elements
To count the number of visitors for each page:
SELECT page_name, ARRAY_LENGTH(visitor_ids) AS num_of_visitors
FROM freshers_in_most_visited_pages;
SELECT page_name, ARRAY_LENGTH(visitor_ids) AS num_of_visitors
FROM freshers_in_most_visited_pages;
The UNNEST Function to Work with Array Data
The UNNEST function takes an array and returns a table, with each element of the array turned into a row.
Example 1: Basic UNNEST
SELECT page_name, individual_visitor_id
FROM freshers_in_most_visited_pages,
UNNEST(visitor_ids) AS individual_visitor_id;
UNNEST with Array Index
To get the visitor IDs along with their position in the array:
SELECT page_name, individual_visitor_id, array_index
FROM freshers_in_most_visited_pages,
UNNEST(visitor_ids) AS individual_visitor_id WITH OFFSET AS array_index;
Output
+-----------+--------------------+-------------+
| page_name | individual_visitor_id | array_index |
+-----------+--------------------+-------------+
| Home | 1 | 0 |
| Home | 2 | 1 |
| Home | 3 | 2 |
| About | 2 | 0 |
| About | 4 | 1 |
| About | 5 | 2 |
| ... | ... | ... |
+-----------+--------------------+-------------+
BigQuery import urls to refer