SQL Window Functions: A Comprehensive Guide with Detailed Queries #
Window functions in SQL are powerful tools that allow you to perform calculations across a set of table rows that are related to the current row. Unlike aggregate functions that return a single value for a group of rows, window functions return a value for each row in the result set. They are essential for performing complex analytics and are widely used in reporting, financial calculations, and data analysis.
This document provides a detailed overview of SQL window functions, along with examples to demonstrate their usage.
1. Understanding SQL Window Functions #
A window function performs a calculation across a set of rows that are somehow related to the current row. The set of rows is referred to as the window frame. Window functions are executed after the FROM
, WHERE
, GROUP BY
, and HAVING
clauses, but before the ORDER BY
clause.
Basic Syntax: #
window_function() OVER (
[PARTITION BY partition_expression]
[ORDER BY sort_expression]
[ROWS or RANGE frame_clause]
)
- window_function(): The window function to be applied (e.g.,
ROW_NUMBER()
,RANK()
,SUM()
). - PARTITION BY: Divides the result set into partitions to which the window function is applied.
- ORDER BY: Defines the order of rows within each partition.
- ROWS or RANGE: Specifies the window frame, which is a subset of rows within the partition.
2. Common SQL Window Functions #
Here are some commonly used window functions:
- ROW_NUMBER(): Assigns a unique sequential integer to rows within a partition.
- RANK(): Assigns a rank to each row within a partition, with gaps in rank for ties.
- DENSE_RANK(): Similar to
RANK()
, but without gaps in rank for ties. - NTILE(n): Divides the rows in an ordered partition into
n
buckets. - LEAD() and LAG(): Accesses data from the subsequent or preceding row.
- FIRST_VALUE() and LAST_VALUE(): Returns the first or last value in the window frame.
- SUM(), AVG(), MIN(), MAX(): Aggregate functions that can be used as window functions.
3. Detailed Queries Using Window Functions #
3.1. ROW_NUMBER() Example #
The ROW_NUMBER()
function assigns a unique sequential integer to each row within a partition.
Query:
SELECT
employee_id,
department_id,
salary,
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS row_num
FROM
employees;
Explanation:
- This query assigns a unique number to each employee within their department, ordered by salary in descending order.
Output:
| employee_id | department_id | salary | row_num |
|-------------|---------------|--------|---------|
| 101| 10 | 100000 | 1 |
| 102| 10 | 95000 | 2 |
| 103| 10 | 90000 | 3 |
| 201| 20 | 85000 | 1 |
| 202| 20 | 80000 | 2 |
3.2. RANK() Example #
The RANK()
function assigns a rank to each row within a partition, with gaps for ties.
Query:
SELECT
employee_id,
department_id,
salary,
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank
FROM
employees;
Explanation:
- This query ranks employees within each department based on their salary, with identical salaries receiving the same rank and gaps in the ranking sequence.
Output:
| employee_id | department_id | salary | rank |
|-------------|---------------|--------|------|
| 101| 10 | 100000 | 1 |
| 102| 10 | 95000 | 2 |
| 103| 10 | 95000 | 2 |
| 104| 10 | 90000 | 4 |
| 201| 20 | 85000 | 1 |
| 202| 20 | 80000 | 2 |
3.3. DENSE_RANK() Example #
The DENSE_RANK()
function is similar to RANK()
, but it does not leave gaps in the ranking sequence for ties.
Query:
SELECT
employee_id,
department_id,
salary,
DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS dense_rank
FROM
employees;
Explanation:
- This query ranks employees within each department based on their salary, with identical salaries receiving the same rank without gaps.
Output:
| employee_id | department_id | salary | dense_rank |
|-------------|---------------|--------|------------|
| 101| 10 | 100000 | 1 |
| 102| 10 | 95000 | 2 |
| 103| 10 | 95000 | 2 |
| 104| 10 | 90000 | 3 |
| 201| 20 | 85000 | 1 |
| 202| 20 | 80000 | 2 |
3.4. NTILE() Example #
The NTILE()
function distributes rows in an ordered partition into a specified number of groups.
Query:
SELECT
employee_id,
department_id,
salary,
NTILE(2) OVER (PARTITION BY department_id ORDER BY salary DESC) AS salary_group
FROM
employees;
Explanation:
- This query divides the employees in each department into two salary groups based on their salary.
Output:
| employee_id | department_id | salary | salary_group |
|-------------|---------------|--------|--------------|
| 101| 10 | 100000 | 1 |
| 102| 10 | 95000 | 1 |
| 103| 10 | 90000 | 2 |
| 104| 10 | 85000 | 2 |
| 201| 20 | 80000 | 1 |
| 202| 20 | 75000 | 2 |
3.5. LEAD() and LAG() Example #
The LEAD()
function provides access to a subsequent row’s data, while LAG()
accesses data from a preceding row.
Query:
SELECT
employee_id,
department_id,
salary,
LEAD(salary, 1) OVER (PARTITION BY department_id ORDER BY salary DESC) AS next_salary,
LAG(salary, 1) OVER (PARTITION BY department_id ORDER BY salary DESC) AS prev_salary
FROM
employees;
Explanation:
- This query retrieves the next and previous salary for each employee within their department, based on salary.
Output:
| employee_id | department_id | salary | next_salary | prev_salary |
|-------------|---------------|--------|-------------|-------------|
| 101| 10 | 100000 | 95000 | NULL |
| 102| 10 | 95000 | 90000 | 100000 |
| 103| 10 | 90000 | 85000 | 95000 |
| 104| 10 | 85000 | NULL | 90000 |
| 201| 20 | 85000 | 80000 | NULL |
| 202| 20 | 80000 | 75000 | 85000 |
3.6. FIRST_VALUE() and LAST_VALUE() Example #
The FIRST_VALUE()
and LAST_VALUE()
functions return the first and last value in the window frame, respectively.
Query:
SELECT
employee_id,
department_id,
salary,
FIRST_VALUE(salary) OVER (PARTITION BY department_id ORDER BY salary DESC) AS highest_salary,
LAST_VALUE(salary) OVER (PARTITION BY department_id ORDER BY salary ASC) AS lowest_salary
FROM
employees;
Explanation:
- This query retrieves the highest and lowest salary for each employee within their department. The
FIRST_VALUE()
function selects the highest salary based on descending order, while theLAST_VALUE()
function selects the lowest salary based on ascending order.
Output:
| employee_id | department_id | salary | highest_salary | lowest_salary |
|-------------|---------------|--------|----------------|---------------|
| 101| 10 | 100000 | 100000 | 85000 |
| 102| 10 | 95000 | 100000 | 85000 |
| 103| 10 | 90000 | 100000 | 85000 |
| 104| 10 | 85000 | 100000 | 85000 |
| 201| 20 | 85000 | 85000 | 80000 |
| 202| 20 | 80000 | 85000 | 80000 |
4. Window Frame Specification: ROWS vs RANGE #
Window functions can operate over a subset of rows within the window, called the window frame. You can specify this window frame using either ROWS
or RANGE
.
4.1. ROWS Frame Example #
The ROWS
specification limits the frame to a specific number of rows relative to the current row.
Query:
SELECT
employee_id,
department_id,
salary,
SUM(salary) OVER (PARTITION BY department_id ORDER BY salary DESC ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS sum_salary_window
FROM
employees;
Explanation:
- This query calculates the sum of the salaries for the current row, the preceding row, and the following row within each department.
Output:
| employee_id | department_id | salary | sum_salary_window |
|-------------|---------------|--------|-------------------|
| 101| 10 | 100000 | 195000 |
| 102| 10 | 95000 | 285000 |
| 103| 10 | 90000 | 275000 |
| 104| 10 | 85000 | 175000 |
| 201| 20 | 85000 | 165000 |
| 202| 20 | 80000 | 165000 |
4.2. RANGE Frame Example #
The RANGE
specification limits the frame based on the value of the current row, allowing you to define a range of values.
Query:
SELECT
employee_id,
department_id,
salary,
SUM(salary) OVER (PARTITION BY department_id ORDER BY salary DESC RANGE BETWEEN 10000 PRECEDING AND 10000 FOLLOWING) AS sum_salary_range
FROM
employees;
Explanation:
- This query calculates the sum of the salaries within a specified range (10,000 units) of the current row’s salary.
Output:
| employee_id | department_id | salary | sum_salary_range |
|-------------|---------------|--------|------------------|
| 101| 10 | 100000 | 195000 |
| 102| 10 | 95000 | 285000 |
| 103| 10 | 90000 | 175000 |
| 104| 10 | 85000 | 175000 |
| 201| 20 | 85000 | 165000 |
| 202| 20 | 80000 | 165000 |
5. Advanced Window Functions #
5.1. Cumulative Sum with SUM() #
The cumulative sum is a common calculation that can be efficiently done using window functions.
Query:
SELECT
employee_id,
department_id,
salary,
SUM(salary) OVER (PARTITION BY department_id ORDER BY salary DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_salary
FROM
employees;
Explanation:
- This query calculates the cumulative sum of salaries within each department, ordered by salary.
Output:
| employee_id | department_id | salary | cumulative_salary |
|-------------|---------------|--------|-------------------|
| 101| 10 | 100000 | 100000 |
| 102| 10 | 95000 | 195000 |
| 103| 10 | 90000 | 285000 |
| 104| 10 | 85000 | 370000 |
| 201| 20 | 85000 | 85000 |
| 202| 20 | 80000 | 165000 |
5.2. Moving Average with AVG() #
Moving averages are often used to smooth out short-term fluctuations and highlight longer-term trends.
Query:
SELECT
employee_id,
department_id,
salary,
AVG(salary) OVER (PARTITION BY department_id ORDER BY salary DESC ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg_salary
FROM
employees;
Explanation:
- This query calculates the moving average of salaries, considering the current row and the two preceding rows within each department.
Output:
| employee_id | department_id | salary | moving_avg_salary |
|-------------|---------------|--------|-------------------|
| 101| 10 | 100000 | 100000.0 |
| 102| 10 | 95000 | 97500.0 |
| 103| 10 | 90000 | 95000.0 |
| 104| 10 | 85000 | 90000.0 |
| 201| 20 | 85000 | 85000.0 |
| 202| 20 | 80000 | 82500.0 |
6. Conclusion #
SQL window functions are essential for advanced data analysis, allowing you to perform complex calculations across sets of rows. By mastering these functions, you can write more efficient and expressive SQL queries, making it easier to perform tasks such as ranking, cumulative sums, moving averages, and more.