Category: article

Hive @ Freshers.in

HiveServer1 vs. HiveServer2: A Comprehensive Comparison

Before diving into the comparison, let’s briefly understand what Hive servers are and their role in the Hive ecosystem. HiveServer1…

Continue Reading HiveServer1 vs. HiveServer2: A Comprehensive Comparison
Hive @ Freshers.in

Hive Script vs. Hive Query: Unraveling the Differences

This article aims to shed light on this topic, offering clarity and real-world examples to illustrate the contrasts. Understanding Hive…

Continue Reading Hive Script vs. Hive Query: Unraveling the Differences

Create an SSH key and use it to push data to an EC2 instance from your GitLab CI/CD pipeline

In GitLab CI/CD, you don’t create SSH keys directly on the server; rather, you generate them on your local machine…

Continue Reading Create an SSH key and use it to push data to an EC2 instance from your GitLab CI/CD pipeline
PySpark @ Freshers.in

Returning the last value in a group during aggregation in PySpark

pyspark.sql.functions.last PySpark’s last() function is part of the PySpark SQL module, and it’s used to return the last value in…

Continue Reading Returning the last value in a group during aggregation in PySpark
PySpark @ Freshers.in

PySpark to count the number of elements in RDDs, DataFrames and DataSets

PySpark count() is a method applied to RDDs (Resilient Distributed Datasets), DataFrames, and DataSets in PySpark to count the number…

Continue Reading PySpark to count the number of elements in RDDs, DataFrames and DataSets

Design a database schema for an online merch store

Designing a database schema for an online merchandise store involves several key tables to handle products, customers, orders, and potentially…

Continue Reading Design a database schema for an online merch store
good to read @Freshers.in

How to retrieve folder sizes using Windows PowerShell

As system administrators or power users, we often need to keep an eye on the sizes of directories within our…

Continue Reading How to retrieve folder sizes using Windows PowerShell
Data Warehouse @ Freshers.in

Version Control and Change Management in Your Data Warehouse

In the dynamic realm of data warehouses, where information evolves continually, version control and change management emerge as pivotal players….

Continue Reading Version Control and Change Management in Your Data Warehouse
Data Warehouse @ Freshers.in

Best Practices for Building a Scalable and Flexible Data Warehouse

Building a data warehouse that stands the test of time requires a strategic blend of scalability and flexibility. This article…

Continue Reading Best Practices for Building a Scalable and Flexible Data Warehouse