Snowflake : Analyze data in a hierarchical manner (CONNECT BY)

Snowflake

Snowflake’s CONNECT BY is a powerful feature that enables hierarchical queries to be performed on data in a Snowflake database. With CONNECT BY, users can easily traverse parent-child relationships and analyze data in a hierarchical manner. In this article, we will explain how to use CONNECT BY in Snowflake, using the “freshers_in” table as an example.

The “freshers_in” table contains information about a company’s new hires, including their names, departments, managers, and salaries. The table has the following columns:

  • “employee_id”: The unique identifier for each employee
  • “employee_name”: The name of each employee
  • “department”: The department that each employee belongs to
  • “manager_id”: The unique identifier of each employee’s manager
  • “salary”: The salary of each employee

Let’s say that we want to analyze the hierarchy of managers in the “freshers_in” table. We can use the following SQL query to achieve this:

SELECT employee_name, manager_id, LEVEL
FROM freshers_in
CONNECT BY PRIOR employee_id = manager_id
START WITH manager_id IS NULL;

Let’s break down this query to understand how it works:

  • The “START WITH” clause specifies the root node of the hierarchy. In this case, we want to start with the employees who do not have a manager (i.e., the top-level managers). Therefore, we specify “manager_id IS NULL” as the starting point.
  • The “CONNECT BY” clause specifies the relationship between parent and child nodes. In this case, we want to connect each employee to their manager. Therefore, we specify “PRIOR employee_id = manager_id” as the connecting condition.
  • The “LEVEL” keyword is used to generate a level number for each row in the hierarchy. The root node is assigned a level of 1, and each child node is assigned a level number that is one greater than its parent’s level.

When we execute this query, we get the following results:

employee_name   manager_id  LEVEL
Alice           NULL        1
Bob             NULL        1
Charlie         Alice       2
David           Alice       2
Eva             Bob         2
Frank           Charlie     3
Gina            Charlie     3
Helen           David       3

In the above result set, we can see that Alice and Bob are the top-level managers, with a LEVEL of 1. Charlie, David, and Eva are direct reports of Alice and Bob, with a LEVEL of 2. Frank, Gina, and Helen are direct reports of Charlie and David, with a LEVEL of 3.

By using CONNECT BY in Snowflake, we can easily analyze hierarchical data and gain insights into the relationships between different entities in our data. This feature is particularly useful for analyzing organizational structures, family trees, and other types of hierarchical data.

Snowflake’s CONNECT BY is a powerful feature that enables hierarchical queries to be performed on data in a Snowflake database. With CONNECT BY, users can easily traverse parent-child relationships and analyze data in a hierarchical manner. By using this feature on the “freshers_in” table, we were able to analyze the hierarchy of managers and gain insights into the organizational structure of the company.

Snowflake important urls to refer

Author: user

Leave a Reply