Overview of Analytic Functions¶
Let us get an overview of Analytics or Windowing Functions as part of SQL.
Aggregate Functions (
sum
,min
,max
,avg
)Window Functions (
lead
,lag
,first_value
,last_value
)Rank Functions (
rank
,dense_rank
,row_number
etc)For all the functions when used as part of Analytic or Windowing functions we use
OVER
clause.For aggregate functions we typically use
PARTITION BY
For global ranking and windowing functions we can use
ORDER BY sort_column
and for ranking and windowing with in a partition or group we can usePARTITION BY partition_column ORDER BY sort_column
.Here is how the syntax will look like.
Aggregate -
func() OVER (PARTITION BY partition_column)
Global Rank -
func() OVER (ORDER BY sort_column DESC)
Rank in a partition -
func() OVER (PARTITION BY partition_column ORDER BY sort_column DESC)
We can also get cumulative or moving metrics by adding
ROWS BETWEEN
clause. We will see details later.
Prepare Tables¶
Let us create couple of tables which will be used for the demonstrations of Windowing and Ranking functions.
We have ORDERS and ORDER_ITEMS tables in our retail database.
Let us take care of computing daily revenue as well as daily product revenue.
As we will be using same data set several times, let us create the tables to pre compute the data.
daily_revenue will have the order_date and revenue, where data is aggregated using order_date as partition key.
daily_product_revenue will have order_date, order_item_product_id and revenue. In this case data is aggregated using order_date and order_item_product_id as partition keys.
Note
Let us create table using CTAS to save daily revenue.
%load_ext sql
The sql extension is already loaded. To reload it, use:
%reload_ext sql
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
%%sql
DROP TABLE IF EXISTS daily_revenue
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
Done.
[]
%%sql
CREATE TABLE daily_revenue
AS
SELECT o.order_date,
round(sum(oi.order_item_subtotal)::numeric, 2) AS revenue
FROM orders o JOIN order_items oi
ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
GROUP BY o.order_date
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
364 rows affected.
[]
%%sql
SELECT * FROM daily_revenue
ORDER BY order_date
LIMIT 10
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.
order_date | revenue |
---|---|
2013-07-25 00:00:00 | 31547.23 |
2013-07-26 00:00:00 | 54713.23 |
2013-07-27 00:00:00 | 48411.48 |
2013-07-28 00:00:00 | 35672.03 |
2013-07-29 00:00:00 | 54579.70 |
2013-07-30 00:00:00 | 49329.29 |
2013-07-31 00:00:00 | 59212.49 |
2013-08-01 00:00:00 | 49160.08 |
2013-08-02 00:00:00 | 50688.58 |
2013-08-03 00:00:00 | 43416.74 |
Note
Let us create table using CTAS to save daily product revenue.
%%sql
DROP TABLE IF EXISTS daily_product_revenue
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
Done.
[]
%%sql
CREATE TABLE daily_product_revenue
AS
SELECT o.order_date,
oi.order_item_product_id,
round(sum(oi.order_item_subtotal)::numeric, 2) AS revenue
FROM orders o JOIN order_items oi
ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
GROUP BY o.order_date, oi.order_item_product_id
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
9120 rows affected.
[]
%%sql
SELECT * FROM daily_product_revenue
ORDER BY order_date, revenue DESC
LIMIT 10
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.
order_date | order_item_product_id | revenue |
---|---|---|
2013-07-25 00:00:00 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 191 | 5099.49 |
2013-07-25 00:00:00 | 957 | 4499.70 |
2013-07-25 00:00:00 | 365 | 3359.44 |
2013-07-25 00:00:00 | 1073 | 2999.85 |
2013-07-25 00:00:00 | 1014 | 2798.88 |
2013-07-25 00:00:00 | 403 | 1949.85 |
2013-07-25 00:00:00 | 502 | 1650.00 |
2013-07-25 00:00:00 | 627 | 1079.73 |
2013-07-25 00:00:00 | 226 | 599.99 |