Extracting minutes from timestamp in Google BigQuery and handling in PySpark

user September 14, 2023 Leave a Comment

Often in data analytics, there’s a need to extract specific parts of a date or timestamp for more granular analysis. One such common operation is extracting minutes from a timestamp. This tutorial focuses on achieving this in Google BigQuery and processing the results with PySpark.

Step 1: Setting up the Table in BigQuery

We’ll first create a table named freshers_in_page_views in BigQuery with sample data.

CREATE TABLE freshers_in_page_views (
    view_id INT64,
    view_timestamp TIMESTAMP,
    view_count INT64
);

Step 2: Inserting Sample Data

Let’s insert some rows of sample data for demonstration.

Insert Statement:

INSERT INTO freshers_in_page_views (view_id, view_timestamp, view_count)
VALUES
(1, TIMESTAMP "2023-09-14 15:43:22 UTC", 120),
(2, TIMESTAMP "2023-09-14 12:05:45 UTC", 89),
(3, TIMESTAMP "2023-09-14 09:15:00 UTC", 155);

INSERT INTO freshers_in_page_views (view_id, view_timestamp, view_count)
VALUES
(1, TIMESTAMP "2023-09-14 15:43:22 UTC", 120),
(2, TIMESTAMP "2023-09-14 12:05:45 UTC", 89),
(3, TIMESTAMP "2023-09-14 09:15:00 UTC", 155);

Step 3: Extracting Minutes from Timestamp in BigQuery

Using the EXTRACT function in BigQuery, we can retrieve the minutes of our timestamps.

Select Statement:

SELECT
    view_id,
    EXTRACT(MINUTE FROM view_timestamp) AS minute_of_hour,
    view_count
FROM freshers_in_page_views;

Output

view_id | minute_of_hour | view_count
------- | -------------- | ----------
1       | 43             | 120
2       | 5              | 89
3       | 15             | 155

Step 4: Processing the Result in PySpark

Once you’ve exported the results from BigQuery, you can utilize PySpark for further processing.

from pyspark.sql import SparkSession
# Initialize Spark session
spark = SparkSession.builder.appName("bigquery-pyspark @ Learning at Freshers.in ").getOrCreate()
# Load the data (assuming a CSV export from BigQuery)
df = spark.read.csv("path_to_your_exported_file.csv", header=True, inferSchema=True)
# Displaying the dataframe with minutes extracted
df.show()

BigQuery import urls to refer

Post Views: 1

Author: user

Leave a Reply Cancel reply

Trending

Recent Posts

Featured Posts – Slider Widget

How PARTITION BY Works in Snowflake, and SQL in general

Stash a specific file using Git

Prevent your computer from locking : Python to simulate mouse movements

AWS EC2 vs Azure Virtual Machines

Production and Industrial Engineering

Engineering Technical campus placement question and answers

JavaScript’s reduceRight() method to iterate over an array from right to left

Merging Multiple Images into a Single PDF File Using Python

Nanotechnology

Electronics and Instrumentation

Most Viewed Posts