How to Use Window Functions in SQL – with Example Queries

Window functions are a powerful feature in SQL that allow you to perform calculations across sets of rows that are related to the current row. They provide a way to compute moving averages, running totals, rankings, and other analytics without the need for complex subqueries or self-joins. In this guide, we‘ll dive deep into how window functions work and walk through several examples of how to use them to solve real-world problems.

Understanding Window Functions

A window function performs an operation on a "window" or set of rows that are somehow related to the current row. But unlike regular aggregate functions like SUM() and AVG() which return a single value per group, window functions return a value for each row in the result set. This allows you to calculate running totals, compare values to an average, rank results, and more.

Window functions are used with the OVER clause, which defines the window of rows to include in the calculation. You can think of the OVER clause as a window into the result set that shifts and resizes as the current row changes. The OVER clause can take three optional parameters:

  1. PARTITION BY – Divides the result set into partitions, each of which is operated on separately
  2. ORDER BY – Sorts the rows within each partition
  3. Frame clause – Explicitly defines which rows to include in the frame for each calculation (we‘ll cover this more later)

Types of Window Functions

There are three main types of window functions:

  1. Aggregate Functions – Perform an aggregation across a set of rows and return a single value for each row in the result set. Examples include SUM(), AVG(), COUNT(), MAX(), MIN().

  2. Ranking Functions – Assign an integer rank to each row within a partition based on its position. Examples include ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE().

  3. Value Functions – Provide access to values from other rows in the window without the need for a self-join. Examples include LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE().

Let‘s look at some examples of each type.

Aggregate Window Function Examples

Suppose we have a sales table with columns for product, sales rep, amount, and date:

product rep amount date
Widget John 100 1/1/23
Widget John 200 1/2/23
Gadget Jane 150 1/3/23
Widget John 300 1/4/23
Gadget Jane 250 1/5/23

To calculate the running total of sales for each rep, we can use the SUM() function with OVER and PARTITION BY:

SELECT 
  product,
  rep, 
  amount,
  date,
  SUM(amount) OVER (
    PARTITION BY rep
    ORDER BY date
  ) AS running_total  
FROM sales;

This will produce:

product rep amount date running_total
Widget John 100 1/1/23 100
Widget John 200 1/2/23 300
Widget John 300 1/4/23 600
Gadget Jane 150 1/3/23 150
Gadget Jane 250 1/5/23 400

The running total restarts at each new rep because we partitioned by the rep column. The totals are also calculated in date order within each partition due to the ORDER BY clause.

We could also compare each sale to the rep‘s average:

SELECT
  product, 
  rep,
  amount,
  date, 
  amount - AVG(amount) OVER (PARTITION BY rep) AS vs_avg
FROM sales;  

Result:

product rep amount date vs_avg
Widget John 100 1/1/23 -100
Widget John 200 1/2/23 0
Widget John 300 1/4/23 100
Gadget Jane 150 1/3/23 -50
Gadget Jane 250 1/5/23 50

Ranking Function Examples

Ranking functions assign integer values to each row based on its position within the partition. To rank sales reps by their total sales amounts:

SELECT
  rep,
  SUM(amount) AS total_sales,
  RANK() OVER (ORDER BY SUM(amount) DESC) AS sales_rank
FROM sales
GROUP BY rep;

This produces:

rep total_sales sales_rank
John 600 1
Jane 400 2

The RANK() function handles ties by giving them the same rank and skipping the next value. So if John and Jane had the same total sales, they would both get rank 1 and the next rank would be 3.

In contrast, DENSE_RANK() does not skip values after ties. ROW_NUMBER() is similar but always assigns unique numbers, even for ties.

Value Function Examples

Value functions let you easily reference values from other rows in the window. For example, to compare each sales amount to the previous and next amounts for that rep:

SELECT
  product,
  rep, 
  amount,
  date,
  LAG(amount, 1) OVER (
    PARTITION BY rep 
    ORDER BY date
  ) AS prev_amount,
  LEAD(amount, 1) OVER (
    PARTITION BY rep
    ORDER BY date  
  ) AS next_amount
FROM sales;

Result:

product rep amount date prev_amount next_amount
Widget John 100 1/1/23 NULL 200
Widget John 200 1/2/23 100 300
Widget John 300 1/4/23 200 NULL
Gadget Jane 150 1/3/23 NULL 250
Gadget Jane 250 1/5/23 150 NULL

LAG looks back to previous rows and LEAD looks ahead to later rows. Since there is no previous value for the first row and no next value for the last row in each partition, those return NULL.

Other value functions like FIRST_VALUE() and LAST_VALUE() return the first or last value in the window frame. These are handy for things like finding each employee‘s hire date or when they received their latest raise.

Defining Window Frames

By default, the window frame includes all rows in the partition from the first to the current row. But we can explicitly control which rows are included using the frame clause.

The frame clause comes after PARTITION BY and ORDER BY and specifies a start and end boundary:

OVER (
  PARTITION BY ...
  ORDER BY ...
  {ROWS | RANGE} BETWEEN <start> AND <end>  
) 

\<start> and \<end> can be one of:

  • UNBOUNDED PRECEDING – From the first row of the partition
  • UNBOUNDED FOLLOWING – To the last row of the partition
  • CURRENT ROW – The current row
  • \<integer> PRECEDING – Number of rows before the current row
  • \<integer> FOLLOWING – Number of rows after the current row

For example, to calculate a 3-day moving average of sales:

SELECT
  rep,
  date,  
  amount,
  AVG(amount) OVER (
    PARTITION BY rep
    ORDER BY date
    ROWS BETWEEN 2 PRECEDING AND CURRENT ROW  
  ) AS moving_avg
FROM sales;

Result:

rep date amount moving_avg
John 1/1/23 100 100.0000
John 1/2/23 200 150.0000
John 1/4/23 300 200.0000
Jane 1/3/23 150 150.0000
Jane 1/5/23 250 200.0000

The moving average is calculated over the current row and 2 previous rows for each rep. Note that the first row for John has no previous rows, so only the current row amount is used. The second row uses the current and 1 preceding, while the third uses the full 3-row frame. This pattern then repeats for Jane in a new partition.

The RANGE keyword is similar but defines the frame in terms of a value range of the ORDER BY column instead of a fixed number of rows. This is useful for things like bucketing values into discrete numeric ranges.

Benefits of Window Functions

Window functions provide several key benefits over using standard aggregations or self joins to calculate analytics:

  1. Simplified queries – Window functions allow you to calculate complex analytics in a single query without the need for joining the table to itself or nesting subqueries. This makes the code easier to write, read, and maintain.

  2. Better performance – Because the window calculations are applied after the main query‘s joins, filters, and aggregations, the optimizer often processes them more efficiently than a self-join approach. The database also does not need to create large intermediate result sets.

  3. Enhanced functionality – Window functions can do things that would be very difficult with standard SQL, like ranking rows, accessing values from other rows, and calculating moving averages. They provide a concise syntax for these use cases.

Conclusion

Window functions are a powerful tool for calculating analytics and generating insights from your data. By understanding how the OVER clause, PARTITION BY, ORDER BY, and frame boundaries work, you can begin leveraging them to write simpler, faster, and more capable SQL queries.

The key is to think about your data in terms of windows – which rows do you want to perform a calculation over for each row? Partitioning, ordering, and framing provide flexible ways to define those windows based on your analysis goals. I encourage you to practice using window functions in your own projects and explore the other functions I didn‘t cover here, like PERCENTILE_CONT() and CUME_DIST(). The more you use them, the more opportunities you will find to replace complex, multi-step logic with elegant window queries!

Similar Posts