Bigdata – Knowledge Base

SQL – Window Functions

SQL Window Functions: A Comprehensive Guide with Detailed Queries #

Window functions in SQL are powerful tools that allow you to perform calculations across a set of table rows that are related to the current row. Unlike aggregate functions that return a single value for a group of rows, window functions return a value for each row in the result set. They are essential for performing complex analytics and are widely used in reporting, financial calculations, and data analysis.

This document provides a detailed overview of SQL window functions, along with examples to demonstrate their usage.

1. Understanding SQL Window Functions #

A window function performs a calculation across a set of rows that are somehow related to the current row. The set of rows is referred to as the window frame. Window functions are executed after the FROM, WHERE, GROUP BY, and HAVING clauses, but before the ORDER BY clause.

Basic Syntax: #

  • window_function(): The window function to be applied (e.g., ROW_NUMBER(), RANK(), SUM()).
  • PARTITION BY: Divides the result set into partitions to which the window function is applied.
  • ORDER BY: Defines the order of rows within each partition.
  • ROWS or RANGE: Specifies the window frame, which is a subset of rows within the partition.

2. Common SQL Window Functions #

Here are some commonly used window functions:

  • ROW_NUMBER(): Assigns a unique sequential integer to rows within a partition.
  • RANK(): Assigns a rank to each row within a partition, with gaps in rank for ties.
  • DENSE_RANK(): Similar to RANK(), but without gaps in rank for ties.
  • NTILE(n): Divides the rows in an ordered partition into n buckets.
  • LEAD() and LAG(): Accesses data from the subsequent or preceding row.
  • FIRST_VALUE() and LAST_VALUE(): Returns the first or last value in the window frame.
  • SUM(), AVG(), MIN(), MAX(): Aggregate functions that can be used as window functions.

3. Detailed Queries Using Window Functions #

3.1. ROW_NUMBER() Example #

The ROW_NUMBER() function assigns a unique sequential integer to each row within a partition.

Query:

Explanation:

  • This query assigns a unique number to each employee within their department, ordered by salary in descending order.

Output:

3.2. RANK() Example #

The RANK() function assigns a rank to each row within a partition, with gaps for ties.

Query:

Explanation:

  • This query ranks employees within each department based on their salary, with identical salaries receiving the same rank and gaps in the ranking sequence.

Output:

3.3. DENSE_RANK() Example #

The DENSE_RANK() function is similar to RANK(), but it does not leave gaps in the ranking sequence for ties.

Query:

Explanation:

  • This query ranks employees within each department based on their salary, with identical salaries receiving the same rank without gaps.

Output:

3.4. NTILE() Example #

The NTILE() function distributes rows in an ordered partition into a specified number of groups.

Query:

Explanation:

  • This query divides the employees in each department into two salary groups based on their salary.

Output:

3.5. LEAD() and LAG() Example #

The LEAD() function provides access to a subsequent row’s data, while LAG() accesses data from a preceding row.

Query:

Explanation:

  • This query retrieves the next and previous salary for each employee within their department, based on salary.

Output:

3.6. FIRST_VALUE() and LAST_VALUE() Example #

The FIRST_VALUE() and LAST_VALUE() functions return the first and last value in the window frame, respectively.

Query:

Explanation:

  • This query retrieves the highest and lowest salary for each employee within their department. The FIRST_VALUE() function selects the highest salary based on descending order, while the LAST_VALUE() function selects the lowest salary based on ascending order.

Output:


4. Window Frame Specification: ROWS vs RANGE #

Window functions can operate over a subset of rows within the window, called the window frame. You can specify this window frame using either ROWS or RANGE.

4.1. ROWS Frame Example #

The ROWS specification limits the frame to a specific number of rows relative to the current row.

Query:

Explanation:

  • This query calculates the sum of the salaries for the current row, the preceding row, and the following row within each department.

Output:

4.2. RANGE Frame Example #

The RANGE specification limits the frame based on the value of the current row, allowing you to define a range of values.

Query:

Explanation:

  • This query calculates the sum of the salaries within a specified range (10,000 units) of the current row’s salary.

Output:

5. Advanced Window Functions #

5.1. Cumulative Sum with SUM() #

The cumulative sum is a common calculation that can be efficiently done using window functions.

Query:

Explanation:

  • This query calculates the cumulative sum of salaries within each department, ordered by salary.

Output:

5.2. Moving Average with AVG() #

Moving averages are often used to smooth out short-term fluctuations and highlight longer-term trends.

Query:

Explanation:

  • This query calculates the moving average of salaries, considering the current row and the two preceding rows within each department.

Output:

6. Conclusion #

SQL window functions are essential for advanced data analysis, allowing you to perform complex calculations across sets of rows. By mastering these functions, you can write more efficient and expressive SQL queries, making it easier to perform tasks such as ranking, cumulative sums, moving averages, and more.

What are your feelings
Updated on September 4, 2024