How to Use Window Functions in SQL – with Example Queries
Window functions are a powerful feature in SQL that allow you to perform calculations across sets of rows that are related to the current row. They provide a way to compute moving averages, running totals, rankings, and other analytics without the need for complex subqueries or self-joins. In this guide, we‘ll dive deep into how window functions work and walk through several examples of how to use them to solve real-world problems.
Understanding Window Functions
A window function performs an operation on a "window" or set of rows that are somehow related to the current row. But unlike regular aggregate functions like SUM() and AVG() which return a single value per group, window functions return a value for each row in the result set. This allows you to calculate running totals, compare values to an average, rank results, and more.
Window functions are used with the OVER clause, which defines the window of rows to include in the calculation. You can think of the OVER clause as a window into the result set that shifts and resizes as the current row changes. The OVER clause can take three optional parameters:
- PARTITION BY – Divides the result set into partitions, each of which is operated on separately
- ORDER BY – Sorts the rows within each partition
- Frame clause – Explicitly defines which rows to include in the frame for each calculation (we‘ll cover this more later)
Types of Window Functions
There are three main types of window functions:
-
Aggregate Functions – Perform an aggregation across a set of rows and return a single value for each row in the result set. Examples include SUM(), AVG(), COUNT(), MAX(), MIN().
-
Ranking Functions – Assign an integer rank to each row within a partition based on its position. Examples include ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE().
-
Value Functions – Provide access to values from other rows in the window without the need for a self-join. Examples include LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE().
Let‘s look at some examples of each type.
Aggregate Window Function Examples
Suppose we have a sales table with columns for product, sales rep, amount, and date:
product | rep | amount | date |
---|---|---|---|
Widget | John | 100 | 1/1/23 |
Widget | John | 200 | 1/2/23 |
Gadget | Jane | 150 | 1/3/23 |
Widget | John | 300 | 1/4/23 |
Gadget | Jane | 250 | 1/5/23 |
To calculate the running total of sales for each rep, we can use the SUM() function with OVER and PARTITION BY:
SELECT
product,
rep,
amount,
date,
SUM(amount) OVER (
PARTITION BY rep
ORDER BY date
) AS running_total
FROM sales;
This will produce:
product | rep | amount | date | running_total |
---|---|---|---|---|
Widget | John | 100 | 1/1/23 | 100 |
Widget | John | 200 | 1/2/23 | 300 |
Widget | John | 300 | 1/4/23 | 600 |
Gadget | Jane | 150 | 1/3/23 | 150 |
Gadget | Jane | 250 | 1/5/23 | 400 |
The running total restarts at each new rep because we partitioned by the rep column. The totals are also calculated in date order within each partition due to the ORDER BY clause.
We could also compare each sale to the rep‘s average:
SELECT
product,
rep,
amount,
date,
amount - AVG(amount) OVER (PARTITION BY rep) AS vs_avg
FROM sales;
Result:
product | rep | amount | date | vs_avg |
---|---|---|---|---|
Widget | John | 100 | 1/1/23 | -100 |
Widget | John | 200 | 1/2/23 | 0 |
Widget | John | 300 | 1/4/23 | 100 |
Gadget | Jane | 150 | 1/3/23 | -50 |
Gadget | Jane | 250 | 1/5/23 | 50 |
Ranking Function Examples
Ranking functions assign integer values to each row based on its position within the partition. To rank sales reps by their total sales amounts:
SELECT
rep,
SUM(amount) AS total_sales,
RANK() OVER (ORDER BY SUM(amount) DESC) AS sales_rank
FROM sales
GROUP BY rep;
This produces:
rep | total_sales | sales_rank |
---|---|---|
John | 600 | 1 |
Jane | 400 | 2 |
The RANK() function handles ties by giving them the same rank and skipping the next value. So if John and Jane had the same total sales, they would both get rank 1 and the next rank would be 3.
In contrast, DENSE_RANK() does not skip values after ties. ROW_NUMBER() is similar but always assigns unique numbers, even for ties.
Value Function Examples
Value functions let you easily reference values from other rows in the window. For example, to compare each sales amount to the previous and next amounts for that rep:
SELECT
product,
rep,
amount,
date,
LAG(amount, 1) OVER (
PARTITION BY rep
ORDER BY date
) AS prev_amount,
LEAD(amount, 1) OVER (
PARTITION BY rep
ORDER BY date
) AS next_amount
FROM sales;
Result:
product | rep | amount | date | prev_amount | next_amount |
---|---|---|---|---|---|
Widget | John | 100 | 1/1/23 | NULL | 200 |
Widget | John | 200 | 1/2/23 | 100 | 300 |
Widget | John | 300 | 1/4/23 | 200 | NULL |
Gadget | Jane | 150 | 1/3/23 | NULL | 250 |
Gadget | Jane | 250 | 1/5/23 | 150 | NULL |
LAG looks back to previous rows and LEAD looks ahead to later rows. Since there is no previous value for the first row and no next value for the last row in each partition, those return NULL.
Other value functions like FIRST_VALUE() and LAST_VALUE() return the first or last value in the window frame. These are handy for things like finding each employee‘s hire date or when they received their latest raise.
Defining Window Frames
By default, the window frame includes all rows in the partition from the first to the current row. But we can explicitly control which rows are included using the frame clause.
The frame clause comes after PARTITION BY and ORDER BY and specifies a start and end boundary:
OVER (
PARTITION BY ...
ORDER BY ...
{ROWS | RANGE} BETWEEN <start> AND <end>
)
\<start> and \<end> can be one of:
- UNBOUNDED PRECEDING – From the first row of the partition
- UNBOUNDED FOLLOWING – To the last row of the partition
- CURRENT ROW – The current row
- \<integer> PRECEDING – Number of rows before the current row
- \<integer> FOLLOWING – Number of rows after the current row
For example, to calculate a 3-day moving average of sales:
SELECT
rep,
date,
amount,
AVG(amount) OVER (
PARTITION BY rep
ORDER BY date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg
FROM sales;
Result:
rep | date | amount | moving_avg |
---|---|---|---|
John | 1/1/23 | 100 | 100.0000 |
John | 1/2/23 | 200 | 150.0000 |
John | 1/4/23 | 300 | 200.0000 |
Jane | 1/3/23 | 150 | 150.0000 |
Jane | 1/5/23 | 250 | 200.0000 |
The moving average is calculated over the current row and 2 previous rows for each rep. Note that the first row for John has no previous rows, so only the current row amount is used. The second row uses the current and 1 preceding, while the third uses the full 3-row frame. This pattern then repeats for Jane in a new partition.
The RANGE keyword is similar but defines the frame in terms of a value range of the ORDER BY column instead of a fixed number of rows. This is useful for things like bucketing values into discrete numeric ranges.
Benefits of Window Functions
Window functions provide several key benefits over using standard aggregations or self joins to calculate analytics:
-
Simplified queries – Window functions allow you to calculate complex analytics in a single query without the need for joining the table to itself or nesting subqueries. This makes the code easier to write, read, and maintain.
-
Better performance – Because the window calculations are applied after the main query‘s joins, filters, and aggregations, the optimizer often processes them more efficiently than a self-join approach. The database also does not need to create large intermediate result sets.
-
Enhanced functionality – Window functions can do things that would be very difficult with standard SQL, like ranking rows, accessing values from other rows, and calculating moving averages. They provide a concise syntax for these use cases.
Conclusion
Window functions are a powerful tool for calculating analytics and generating insights from your data. By understanding how the OVER clause, PARTITION BY, ORDER BY, and frame boundaries work, you can begin leveraging them to write simpler, faster, and more capable SQL queries.
The key is to think about your data in terms of windows – which rows do you want to perform a calculation over for each row? Partitioning, ordering, and framing provide flexible ways to define those windows based on your analysis goals. I encourage you to practice using window functions in your own projects and explore the other functions I didn‘t cover here, like PERCENTILE_CONT() and CUME_DIST(). The more you use them, the more opportunities you will find to replace complex, multi-step logic with elegant window queries!