Amazon Redshift, a powerful data warehousing solution, offers a variety of analytical functions to help you gain insights into your data. One such function is PERCENT_RANK(). In this article, we will delve into the details of the PERCENT_RANK() function with practical examples. We will also explore where this function can be applied for valuable insights.
DDL Example: Creating a Sample Table
Let’s start by creating a sample table called freshers_in_webstats to work with.
CREATE TABLE freshers_in_webstats (
date DATE,
pageviews INT,
visitors INT
);
In this table, we have three columns: date, pageviews, and visitors, representing the date of the statistics, the number of pageviews, and the number of visitors, respectively.
DML Example: Populating Data
Next, let’s insert some sample data into our table to demonstrate the PERCENT_RANK() function effectively:
INSERT INTO freshers_in_webstats (date, pageviews, visitors)
VALUES
('2023-09-01', 1000, 500),
('2023-09-02', 1200, 600),
('2023-09-03', 1500, 750),
('2023-09-04', 800, 400),
('2023-09-05', 1300, 650);
Now, we have a dataset with five days of web statistics.
Understanding PERCENT_RANK()
The PERCENT_RANK() function is used to calculate the relative rank of a row within a result set as a percentage. It can provide insights into how a particular row’s value compares to others in the dataset.
The formula for PERCENT_RANK() is as follows:
PERCENT_RANK() OVER (ORDER BY column_name)
Practical Example: Using PERCENT_RANK()
Suppose we want to calculate the PERCENT_RANK() for the number of visitors each day in our freshers_in_webstats table. We can use the following SQL query:
SELECT
date,
visitors,
PERCENT_RANK() OVER (ORDER BY visitors) AS visitors_percent_rank
FROM
freshers_in_webstats;
In this query:
We select the date and visitors columns.
We apply the PERCENT_RANK() function over the visitors column, ordering the rows in ascending order based on the number of visitors.
We alias the result of PERCENT_RANK() as visitors_percent_rank.
Result Set:
date | visitors | visitors_percent_rank
--------------+----------+-----------------------
2023-09-04 | 400 | 0.00
2023-09-01 | 500 | 0.25
2023-09-02 | 600 | 0.5
2023-09-05 | 650 | 0.75
2023-09-03 | 750 | 1.00
Read more on Redshift
Read more on Hive
Read more on Snowflake