Category: article
HiveServer1 vs. HiveServer2: A Comprehensive Comparison
Before diving into the comparison, let’s briefly understand what Hive servers are and their role in the Hive ecosystem. HiveServer1…
Hive Script vs. Hive Query: Unraveling the Differences
This article aims to shed light on this topic, offering clarity and real-world examples to illustrate the contrasts. Understanding Hive…
Create an SSH key and use it to push data to an EC2 instance from your GitLab CI/CD pipeline
In GitLab CI/CD, you don’t create SSH keys directly on the server; rather, you generate them on your local machine…
Returning the last value in a group during aggregation in PySpark
pyspark.sql.functions.last PySpark’s last() function is part of the PySpark SQL module, and it’s used to return the last value in…
PySpark : Converting the first letter of each word in a string to uppercase and the rest to lowercase using PySpark
PySpark’s initcap() function is used to convert the first letter of each word in a string to uppercase and the…
PySpark to count the number of elements in RDDs, DataFrames and DataSets
PySpark count() is a method applied to RDDs (Resilient Distributed Datasets), DataFrames, and DataSets in PySpark to count the number…
Design a database schema for an online merch store
Designing a database schema for an online merchandise store involves several key tables to handle products, customers, orders, and potentially…
How to retrieve folder sizes using Windows PowerShell
As system administrators or power users, we often need to keep an eye on the sizes of directories within our…
Version Control and Change Management in Your Data Warehouse
In the dynamic realm of data warehouses, where information evolves continually, version control and change management emerge as pivotal players….
Best Practices for Building a Scalable and Flexible Data Warehouse
Building a data warehouse that stands the test of time requires a strategic blend of scalability and flexibility. This article…