Overview of Analytic Functions¶
Let us get an overview of Analytics or Windowing Functions as part of SQL.
Aggregate Functions (
sum
,min
,max
,avg
)Window Functions (
lead
,lag
,first_value
,last_value
)Rank Functions (
rank
,dense_rank
,row_number
etc)For all the functions when used as part of Analytic or Windowing functions we use
OVER
clause.For aggregate functions we typically use
PARTITION BY
For global ranking and windowing functions we can use
ORDER BY sort_column
and for ranking and windowing with in a partition or group we can usePARTITION BY partition_column ORDER BY sort_column
.Here is how the syntax will look like.
Aggregate -
func() OVER (PARTITION BY partition_column)
Global Rank -
func() OVER (ORDER BY sort_column DESC)
Rank in a partition -
func() OVER (PARTITION BY partition_column ORDER BY sort_column DESC)
We can also get cumulative or moving metrics by adding
ROWS BETWEEN
clause. We will see details later.
Prepare Tables¶
Let us create couple of tables which will be used for the demonstrations of Windowing and Ranking functions.
We have ORDERS and ORDER_ITEMS tables in our retail database.
Let us take care of computing daily revenue as well as daily product revenue.
As we will be using same data set several times, let us create the tables to pre compute the data.
daily_revenue will have the order_date and revenue, where data is aggregated using order_date as partition key.
daily_product_revenue will have order_date, order_item_product_id and revenue. In this case data is aggregated using order_date and order_item_product_id as partition keys.
Note
Let us create table using CTAS to save daily revenue.
order_date | revenue |
---|---|
2013-07-25 00:00:00 | 31547.23 |
2013-07-26 00:00:00 | 54713.23 |
2013-07-27 00:00:00 | 48411.48 |
2013-07-28 00:00:00 | 35672.03 |
2013-07-29 00:00:00 | 54579.70 |
2013-07-30 00:00:00 | 49329.29 |
2013-07-31 00:00:00 | 59212.49 |
2013-08-01 00:00:00 | 49160.08 |
2013-08-02 00:00:00 | 50688.58 |
2013-08-03 00:00:00 | 43416.74 |
Note
Let us create table using CTAS to save daily product revenue.
%%sql
CREATE TABLE daily_product_revenue
AS
SELECT o.order_date,
oi.order_item_product_id,
round(sum(oi.order_item_subtotal)::numeric, 2) AS revenue
FROM orders o JOIN order_items oi
ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
GROUP BY o.order_date, oi.order_item_product_id
order_date | order_item_product_id | revenue |
---|---|---|
2013-07-25 00:00:00 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 191 | 5099.49 |
2013-07-25 00:00:00 | 957 | 4499.70 |
2013-07-25 00:00:00 | 365 | 3359.44 |
2013-07-25 00:00:00 | 1073 | 2999.85 |
2013-07-25 00:00:00 | 1014 | 2798.88 |
2013-07-25 00:00:00 | 403 | 1949.85 |
2013-07-25 00:00:00 | 502 | 1650.00 |
2013-07-25 00:00:00 | 627 | 1079.73 |
2013-07-25 00:00:00 | 226 | 599.99 |