SQL

SQL Window Functions: A Comprehensive Guide with Detailed Queries #

Window functions in SQL are powerful tools that allow you to perform calculations across a set of table rows that are related to the current row. Unlike aggregate functions that return a single value for a group of rows, window functions return a value for each row in the result set. They are essential for performing complex analytics and are widely used in reporting, financial calculations, and data analysis.

This document provides a detailed overview of SQL window functions, along with examples to demonstrate their usage.

1. Understanding SQL Window Functions #

A window function performs a calculation across a set of rows that are somehow related to the current row. The set of rows is referred to as the window frame. Window functions are executed after the FROM, WHERE, GROUP BY, and HAVING clauses, but before the ORDER BY clause.

Basic Syntax: #

window_function() OVER (
    [PARTITION BY partition_expression]
    [ORDER BY sort_expression]
    [ROWS or RANGE frame_clause]
)

window_function(): The window function to be applied (e.g., ROW_NUMBER(), RANK(), SUM()).
PARTITION BY: Divides the result set into partitions to which the window function is applied.
ORDER BY: Defines the order of rows within each partition.
ROWS or RANGE: Specifies the window frame, which is a subset of rows within the partition.

2. Common SQL Window Functions #

Here are some commonly used window functions:

ROW_NUMBER(): Assigns a unique sequential integer to rows within a partition.
RANK(): Assigns a rank to each row within a partition, with gaps in rank for ties.
DENSE_RANK(): Similar to RANK(), but without gaps in rank for ties.
NTILE(n): Divides the rows in an ordered partition into n buckets.
LEAD() and LAG(): Accesses data from the subsequent or preceding row.
FIRST_VALUE() and LAST_VALUE(): Returns the first or last value in the window frame.
SUM(), AVG(), MIN(), MAX(): Aggregate functions that can be used as window functions.

3. Detailed Queries Using Window Functions #

3.1. ROW_NUMBER() Example #

The ROW_NUMBER() function assigns a unique sequential integer to each row within a partition.

Query:

SELECT 
    employee_id,
    department_id,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS row_num
FROM 
    employees;

Explanation:

This query assigns a unique number to each employee within their department, ordered by salary in descending order.

Output:

| employee_id | department_id | salary | row_num |
|-------------|---------------|--------|---------|
|          101|             10 | 100000 |       1 |
|          102|             10 |  95000 |       2 |
|          103|             10 |  90000 |       3 |
|          201|             20 |  85000 |       1 |
|          202|             20 |  80000 |       2 |

3.2. RANK() Example #

The RANK() function assigns a rank to each row within a partition, with gaps for ties.

Query:

SELECT 
    employee_id,
    department_id,
    salary,
    RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank
FROM 
    employees;

Explanation:

This query ranks employees within each department based on their salary, with identical salaries receiving the same rank and gaps in the ranking sequence.

Output:

| employee_id | department_id | salary | rank |
|-------------|---------------|--------|------|
|          101|             10 | 100000 |    1 |
|          102|             10 |  95000 |    2 |
|          103|             10 |  95000 |    2 |
|          104|             10 |  90000 |    4 |
|          201|             20 |  85000 |    1 |
|          202|             20 |  80000 |    2 |

3.3. DENSE_RANK() Example #

The DENSE_RANK() function is similar to RANK(), but it does not leave gaps in the ranking sequence for ties.

Query:

SELECT 
    employee_id,
    department_id,
    salary,
    DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS dense_rank
FROM 
    employees;

Explanation:

This query ranks employees within each department based on their salary, with identical salaries receiving the same rank without gaps.

Output:

| employee_id | department_id | salary | dense_rank |
|-------------|---------------|--------|------------|
|          101|             10 | 100000 |          1 |
|          102|             10 |  95000 |          2 |
|          103|             10 |  95000 |          2 |
|          104|             10 |  90000 |          3 |
|          201|             20 |  85000 |          1 |
|          202|             20 |  80000 |          2 |

3.4. NTILE() Example #

The NTILE() function distributes rows in an ordered partition into a specified number of groups.

Query:

SELECT 
    employee_id,
    department_id,
    salary,
    NTILE(2) OVER (PARTITION BY department_id ORDER BY salary DESC) AS salary_group
FROM 
    employees;

Explanation:

This query divides the employees in each department into two salary groups based on their salary.

Output:

| employee_id | department_id | salary | salary_group |
|-------------|---------------|--------|--------------|
|          101|             10 | 100000 |            1 |
|          102|             10 |  95000 |            1 |
|          103|             10 |  90000 |            2 |
|          104|             10 |  85000 |            2 |
|          201|             20 |  80000 |            1 |
|          202|             20 |  75000 |            2 |

3.5. LEAD() and LAG() Example #

The LEAD() function provides access to a subsequent row’s data, while LAG() accesses data from a preceding row.

Query:

SELECT 
    employee_id,
    department_id,
    salary,
    LEAD(salary, 1) OVER (PARTITION BY department_id ORDER BY salary DESC) AS next_salary,
    LAG(salary, 1) OVER (PARTITION BY department_id ORDER BY salary DESC) AS prev_salary
FROM 
    employees;

Explanation:

This query retrieves the next and previous salary for each employee within their department, based on salary.

Output:

| employee_id | department_id | salary | next_salary | prev_salary |
|-------------|---------------|--------|-------------|-------------|
|          101|             10 | 100000 |       95000 |        NULL |
|          102|             10 |  95000 |       90000 |      100000 |
|          103|             10 |  90000 |       85000 |       95000 |
|          104|             10 |  85000 |        NULL |       90000 |
|          201|             20 |  85000 |       80000 |        NULL |
|          202|             20 |  80000 |       75000 |       85000 |

3.6. FIRST_VALUE() and LAST_VALUE() Example #

The FIRST_VALUE() and LAST_VALUE() functions return the first and last value in the window frame, respectively.

Query:

SELECT 
    employee_id,
    department_id,
    salary,
    FIRST_VALUE(salary) OVER (PARTITION BY department_id ORDER BY salary DESC) AS highest_salary,
    LAST_VALUE(salary) OVER (PARTITION BY department_id ORDER BY salary ASC) AS lowest_salary
FROM 
    employees;

Explanation:

This query retrieves the highest and lowest salary for each employee within their department. The FIRST_VALUE() function selects the highest salary based on descending order, while the LAST_VALUE() function selects the lowest salary based on ascending order.

Output:

| employee_id | department_id | salary | highest_salary | lowest_salary |
|-------------|---------------|--------|----------------|---------------|
|          101|             10 | 100000 |         100000 |         85000 |
|          102|             10 |  95000 |         100000 |         85000 |
|          103|             10 |  90000 |         100000 |         85000 |
|          104|             10 |  85000 |         100000 |         85000 |
|          201|             20 |  85000 |          85000 |         80000 |
|          202|             20 |  80000 |          85000 |         80000 |

4. Window Frame Specification: ROWS vs RANGE #

Window functions can operate over a subset of rows within the window, called the window frame. You can specify this window frame using either ROWS or RANGE.

4.1. ROWS Frame Example #

The ROWS specification limits the frame to a specific number of rows relative to the current row.

Query:

SELECT 
    employee_id,
    department_id,
    salary,
    SUM(salary) OVER (PARTITION BY department_id ORDER BY salary DESC ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS sum_salary_window
FROM 
    employees;

Explanation:

This query calculates the sum of the salaries for the current row, the preceding row, and the following row within each department.

Output:

| employee_id | department_id | salary | sum_salary_window |
|-------------|---------------|--------|-------------------|
|          101|             10 | 100000 |            195000 |
|          102|             10 |  95000 |            285000 |
|          103|             10 |  90000 |            275000 |
|          104|             10 |  85000 |            175000 |
|          201|             20 |  85000 |            165000 |
|          202|             20 |  80000 |            165000 |

4.2. RANGE Frame Example #

The RANGE specification limits the frame based on the value of the current row, allowing you to define a range of values.

Query:

SELECT 
    employee_id,
    department_id,
    salary,
    SUM(salary) OVER (PARTITION BY department_id ORDER BY salary DESC RANGE BETWEEN 10000 PRECEDING AND 10000 FOLLOWING) AS sum_salary_range
FROM 
    employees;

Explanation:

This query calculates the sum of the salaries within a specified range (10,000 units) of the current row’s salary.

Output:

| employee_id | department_id | salary | sum_salary_range |
|-------------|---------------|--------|------------------|
|          101|             10 | 100000 |           195000 |
|          102|             10 |  95000 |           285000 |
|          103|             10 |  90000 |           175000 |
|          104|             10 |  85000 |           175000 |
|          201|             20 |  85000 |           165000 |
|          202|             20 |  80000 |           165000 |

5. Advanced Window Functions #

5.1. Cumulative Sum with SUM() #

The cumulative sum is a common calculation that can be efficiently done using window functions.

Query:

SELECT 
    employee_id,
    department_id,
    salary,
    SUM(salary) OVER (PARTITION BY department_id ORDER BY salary DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_salary
FROM 
    employees;

Explanation:

This query calculates the cumulative sum of salaries within each department, ordered by salary.

Output:

| employee_id | department_id | salary | cumulative_salary |
|-------------|---------------|--------|-------------------|
|          101|             10 | 100000 |            100000 |
|          102|             10 |  95000 |            195000 |
|          103|             10 |  90000 |            285000 |
|          104|             10 |  85000 |            370000 |
|          201|             20 |  85000 |             85000 |
|          202|             20 |  80000 |            165000 |

5.2. Moving Average with AVG() #

Moving averages are often used to smooth out short-term fluctuations and highlight longer-term trends.

Query:

SELECT 
    employee_id,
    department_id,
    salary,
    AVG(salary) OVER (PARTITION BY department_id ORDER BY salary DESC ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg_salary
FROM 
    employees;

Explanation:

This query calculates the moving average of salaries, considering the current row and the two preceding rows within each department.

Output:

| employee_id | department_id | salary | moving_avg_salary |
|-------------|---------------|--------|-------------------|
|          101|             10 | 100000 |           100000.0 |
|          102|             10 |  95000 |            97500.0 |
|          103|             10 |  90000 |            95000.0 |
|          104|             10 |  85000 |            90000.0 |
|          201|             20 |  85000 |            85000.0 |
|          202|             20 |  80000 |            82500.0 |

6. Conclusion #

SQL window functions are essential for advanced data analysis, allowing you to perform complex calculations across sets of rows. By mastering these functions, you can write more efficient and expressive SQL queries, making it easier to perform tasks such as ranking, cumulative sums, moving averages, and more.

codeIn [Spark]

Bigdata – Knowledge Base

Pyspark

Spark Optimization

Python

Git

Hive

Unix Commands

AWS – Cloud

SQL – Window Functions

SQL Window Functions: A Comprehensive Guide with Detailed Queries #

1. Understanding SQL Window Functions #

Basic Syntax: #

2. Common SQL Window Functions #

3. Detailed Queries Using Window Functions #

3.1. ROW_NUMBER() Example #

3.2. RANK() Example #

3.3. DENSE_RANK() Example #

3.4. NTILE() Example #

3.5. LEAD() and LAG() Example #

3.6. FIRST_VALUE() and LAST_VALUE() Example #

4. Window Frame Specification: ROWS vs RANGE #

4.1. ROWS Frame Example #

4.2. RANGE Frame Example #

5. Advanced Window Functions #

5.1. Cumulative Sum with SUM() #

5.2. Moving Average with AVG() #

6. Conclusion #

What are your feelings

codeIn [Spark]

Pyspark

HIVE

SQL

Git

Bigdata – Knowledge Base

SQL – Window Functions

SQL Window Functions: A Comprehensive Guide with Detailed Queries #

1. Understanding SQL Window Functions #

Basic Syntax: #

2. Common SQL Window Functions #

3. Detailed Queries Using Window Functions #

3.1. ROW_NUMBER() Example #

3.2. RANK() Example #

3.3. DENSE_RANK() Example #

3.4. NTILE() Example #

3.5. LEAD() and LAG() Example #

3.6. FIRST_VALUE() and LAST_VALUE() Example #

4. Window Frame Specification: ROWS vs RANGE #

4.1. ROWS Frame Example #

4.2. RANGE Frame Example #

5. Advanced Window Functions #

5.1. Cumulative Sum with SUM() #

5.2. Moving Average with AVG() #

6. Conclusion #

What are your feelings

Share This Article: