Analytic Functions – Windowing¶
Let us go through the list of Windowing functions supported by Postgres.
lead
andlag
first_value
andlast_value
We can either use
ORDER BY sort_column
orPARTITION BY partition_column ORDER BY sort_column
while using Windowing Functions.
%load_ext sql
The sql extension is already loaded. To reload it, use:
%reload_ext sql
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
Getting LEAD and LAG values¶
Let us understand LEAD and LAG functions to get column values from following or prior records.
Note
Here is the example to get values from either immediate prior or following record along with values from curent record. We will get values from prior or following record based on ORDER BY
within OVER
Clause.
%%sql
SELECT t.*,
lead(order_date) OVER (ORDER BY order_date DESC) AS prior_date,
lead(revenue) OVER (ORDER BY order_date DESC) AS prior_revenue,
lag(order_date) OVER (ORDER BY order_date) AS lag_prior_date,
lag(revenue) OVER (ORDER BY order_date) AS lag_prior_revenue
FROM daily_revenue AS t
ORDER BY order_date DESC
LIMIT 10
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.
order_date | revenue | prior_date | prior_revenue | lag_prior_date | lag_prior_revenue |
---|---|---|---|---|---|
2014-07-24 00:00:00 | 50885.19 | 2014-07-23 00:00:00 | 38795.23 | 2014-07-23 00:00:00 | 38795.23 |
2014-07-23 00:00:00 | 38795.23 | 2014-07-22 00:00:00 | 36717.24 | 2014-07-22 00:00:00 | 36717.24 |
2014-07-22 00:00:00 | 36717.24 | 2014-07-21 00:00:00 | 51427.70 | 2014-07-21 00:00:00 | 51427.70 |
2014-07-21 00:00:00 | 51427.70 | 2014-07-20 00:00:00 | 60047.45 | 2014-07-20 00:00:00 | 60047.45 |
2014-07-20 00:00:00 | 60047.45 | 2014-07-19 00:00:00 | 38420.99 | 2014-07-19 00:00:00 | 38420.99 |
2014-07-19 00:00:00 | 38420.99 | 2014-07-18 00:00:00 | 43856.60 | 2014-07-18 00:00:00 | 43856.60 |
2014-07-18 00:00:00 | 43856.60 | 2014-07-17 00:00:00 | 36384.77 | 2014-07-17 00:00:00 | 36384.77 |
2014-07-17 00:00:00 | 36384.77 | 2014-07-16 00:00:00 | 43011.92 | 2014-07-16 00:00:00 | 43011.92 |
2014-07-16 00:00:00 | 43011.92 | 2014-07-15 00:00:00 | 53480.23 | 2014-07-15 00:00:00 | 53480.23 |
2014-07-15 00:00:00 | 53480.23 | 2014-07-14 00:00:00 | 29937.52 | 2014-07-14 00:00:00 | 29937.52 |
Note
Here is the example to get values from either prior or following 7th record along with values from current record.
%%sql
SELECT t.*,
lead(order_date, 7) OVER (ORDER BY order_date DESC) AS prior_date,
lead(revenue, 7) OVER (ORDER BY order_date DESC) AS prior_revenue
FROM daily_revenue t
ORDER BY order_date DESC
LIMIT 10
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.
order_date | revenue | prior_date | prior_revenue |
---|---|---|---|
2014-07-24 00:00:00 | 50885.19 | 2014-07-17 00:00:00 | 36384.77 |
2014-07-23 00:00:00 | 38795.23 | 2014-07-16 00:00:00 | 43011.92 |
2014-07-22 00:00:00 | 36717.24 | 2014-07-15 00:00:00 | 53480.23 |
2014-07-21 00:00:00 | 51427.70 | 2014-07-14 00:00:00 | 29937.52 |
2014-07-20 00:00:00 | 60047.45 | 2014-07-13 00:00:00 | 40410.99 |
2014-07-19 00:00:00 | 38420.99 | 2014-07-12 00:00:00 | 38449.77 |
2014-07-18 00:00:00 | 43856.60 | 2014-07-11 00:00:00 | 29596.32 |
2014-07-17 00:00:00 | 36384.77 | 2014-07-10 00:00:00 | 47826.02 |
2014-07-16 00:00:00 | 43011.92 | 2014-07-09 00:00:00 | 36929.91 |
2014-07-15 00:00:00 | 53480.23 | 2014-07-08 00:00:00 | 50434.81 |
Note
For values related to non existing prior or following record, we will get nulls.
%%sql
SELECT t.*,
lead(order_date, 7) OVER (ORDER BY order_date DESC) AS prior_date,
lead(revenue, 7) OVER (ORDER BY order_date DESC) AS prior_revenue
FROM daily_revenue t
ORDER BY order_date
LIMIT 10
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.
order_date | revenue | prior_date | prior_revenue |
---|---|---|---|
2013-07-25 00:00:00 | 31547.23 | None | None |
2013-07-26 00:00:00 | 54713.23 | None | None |
2013-07-27 00:00:00 | 48411.48 | None | None |
2013-07-28 00:00:00 | 35672.03 | None | None |
2013-07-29 00:00:00 | 54579.70 | None | None |
2013-07-30 00:00:00 | 49329.29 | None | None |
2013-07-31 00:00:00 | 59212.49 | None | None |
2013-08-01 00:00:00 | 49160.08 | 2013-07-25 00:00:00 | 31547.23 |
2013-08-02 00:00:00 | 50688.58 | 2013-07-26 00:00:00 | 54713.23 |
2013-08-03 00:00:00 | 43416.74 | 2013-07-27 00:00:00 | 48411.48 |
Note
We can replace nulls by passing relevant values as 3rd argument. However, the data type of the values should be compatible with the columns on which lead
or lag
is applied.
%%sql
SELECT t.*,
lead(order_date, 7) OVER (ORDER BY order_date DESC) AS prior_date,
lead(revenue, 7, 0.0) OVER (ORDER BY order_date DESC) AS prior_revenue
FROM daily_revenue t
ORDER BY order_date
LIMIT 10
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.
order_date | revenue | prior_date | prior_revenue |
---|---|---|---|
2013-07-25 00:00:00 | 31547.23 | None | 0.0 |
2013-07-26 00:00:00 | 54713.23 | None | 0.0 |
2013-07-27 00:00:00 | 48411.48 | None | 0.0 |
2013-07-28 00:00:00 | 35672.03 | None | 0.0 |
2013-07-29 00:00:00 | 54579.70 | None | 0.0 |
2013-07-30 00:00:00 | 49329.29 | None | 0.0 |
2013-07-31 00:00:00 | 59212.49 | None | 0.0 |
2013-08-01 00:00:00 | 49160.08 | 2013-07-25 00:00:00 | 31547.23 |
2013-08-02 00:00:00 | 50688.58 | 2013-07-26 00:00:00 | 54713.23 |
2013-08-03 00:00:00 | 43416.74 | 2013-07-27 00:00:00 | 48411.48 |
%%sql
SELECT * FROM daily_product_revenue
ORDER BY order_date, revenue DESC
LIMIT 10
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.
order_date | order_item_product_id | revenue |
---|---|---|
2013-07-25 00:00:00 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 191 | 5099.49 |
2013-07-25 00:00:00 | 957 | 4499.70 |
2013-07-25 00:00:00 | 365 | 3359.44 |
2013-07-25 00:00:00 | 1073 | 2999.85 |
2013-07-25 00:00:00 | 1014 | 2798.88 |
2013-07-25 00:00:00 | 403 | 1949.85 |
2013-07-25 00:00:00 | 502 | 1650.00 |
2013-07-25 00:00:00 | 627 | 1079.73 |
2013-07-25 00:00:00 | 226 | 599.99 |
%%sql
SELECT t.*,
LEAD(order_item_product_id) OVER (
PARTITION BY order_date
ORDER BY revenue DESC
) next_product_id,
LEAD(revenue) OVER (
PARTITION BY order_date
ORDER BY revenue DESC
) next_revenue
FROM daily_product_revenue t
ORDER BY order_date, revenue DESC
LIMIT 30
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
30 rows affected.
order_date | order_item_product_id | revenue | next_product_id | next_revenue |
---|---|---|---|---|
2013-07-25 00:00:00 | 1004 | 5599.72 | 191 | 5099.49 |
2013-07-25 00:00:00 | 191 | 5099.49 | 957 | 4499.70 |
2013-07-25 00:00:00 | 957 | 4499.70 | 365 | 3359.44 |
2013-07-25 00:00:00 | 365 | 3359.44 | 1073 | 2999.85 |
2013-07-25 00:00:00 | 1073 | 2999.85 | 1014 | 2798.88 |
2013-07-25 00:00:00 | 1014 | 2798.88 | 403 | 1949.85 |
2013-07-25 00:00:00 | 403 | 1949.85 | 502 | 1650.00 |
2013-07-25 00:00:00 | 502 | 1650.00 | 627 | 1079.73 |
2013-07-25 00:00:00 | 627 | 1079.73 | 226 | 599.99 |
2013-07-25 00:00:00 | 226 | 599.99 | 24 | 319.96 |
2013-07-25 00:00:00 | 24 | 319.96 | 821 | 207.96 |
2013-07-25 00:00:00 | 821 | 207.96 | 625 | 199.99 |
2013-07-25 00:00:00 | 625 | 199.99 | 705 | 119.99 |
2013-07-25 00:00:00 | 705 | 119.99 | 572 | 119.97 |
2013-07-25 00:00:00 | 572 | 119.97 | 666 | 109.99 |
2013-07-25 00:00:00 | 666 | 109.99 | 725 | 108.00 |
2013-07-25 00:00:00 | 725 | 108.00 | 134 | 100.00 |
2013-07-25 00:00:00 | 134 | 100.00 | 906 | 99.96 |
2013-07-25 00:00:00 | 906 | 99.96 | 828 | 95.97 |
2013-07-25 00:00:00 | 828 | 95.97 | 810 | 79.96 |
2013-07-25 00:00:00 | 810 | 79.96 | 924 | 79.95 |
2013-07-25 00:00:00 | 924 | 79.95 | 926 | 79.95 |
2013-07-25 00:00:00 | 926 | 79.95 | 93 | 74.97 |
2013-07-25 00:00:00 | 93 | 74.97 | 835 | 63.98 |
2013-07-25 00:00:00 | 835 | 63.98 | 897 | 49.98 |
2013-07-25 00:00:00 | 897 | 49.98 | None | None |
2013-07-26 00:00:00 | 1004 | 10799.46 | 365 | 7978.67 |
2013-07-26 00:00:00 | 365 | 7978.67 | 957 | 6899.54 |
2013-07-26 00:00:00 | 957 | 6899.54 | 191 | 6799.32 |
2013-07-26 00:00:00 | 191 | 6799.32 | 1014 | 4798.08 |
Getting first and last values¶
Let us see how we can get first and last value based on the criteria. min
or max
can be used to get only the min or max of the metric we are interested in, however we cannot get other attributes of those records.
Here is the example of using first_value.
%%sql
SELECT t.*,
first_value(order_item_product_id) OVER (
PARTITION BY order_date ORDER BY revenue DESC
) first_product_id,
first_value(revenue) OVER (
PARTITION BY order_date ORDER BY revenue DESC
) first_revenue,
max(revenue) OVER (
PARTITION BY order_date
) max_revenue
FROM daily_product_revenue t
ORDER BY order_date, revenue DESC
LIMIT 10
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.
order_date | order_item_product_id | revenue | first_product_id | first_revenue | max_revenue |
---|---|---|---|---|---|
2013-07-25 00:00:00 | 1004 | 5599.72 | 1004 | 5599.72 | 5599.72 |
2013-07-25 00:00:00 | 191 | 5099.49 | 1004 | 5599.72 | 5599.72 |
2013-07-25 00:00:00 | 957 | 4499.70 | 1004 | 5599.72 | 5599.72 |
2013-07-25 00:00:00 | 365 | 3359.44 | 1004 | 5599.72 | 5599.72 |
2013-07-25 00:00:00 | 1073 | 2999.85 | 1004 | 5599.72 | 5599.72 |
2013-07-25 00:00:00 | 1014 | 2798.88 | 1004 | 5599.72 | 5599.72 |
2013-07-25 00:00:00 | 403 | 1949.85 | 1004 | 5599.72 | 5599.72 |
2013-07-25 00:00:00 | 502 | 1650.00 | 1004 | 5599.72 | 5599.72 |
2013-07-25 00:00:00 | 627 | 1079.73 | 1004 | 5599.72 | 5599.72 |
2013-07-25 00:00:00 | 226 | 599.99 | 1004 | 5599.72 | 5599.72 |
Let us see an example with last_value. While using last_value we need to specify ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING.
By default it uses
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
.The last value with in
UNBOUNDED PRECEDING AND CURRENT ROW
will be current record.To get the right value, we have to change the windowing clause to
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
.
%%sql
SELECT t.*,
last_value(order_item_product_id) OVER (
PARTITION BY order_date ORDER BY revenue
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) last_product_id,
max(revenue) OVER (
PARTITION BY order_date
) last_revenue
FROM daily_product_revenue AS t
ORDER BY order_date, revenue DESC
LIMIT 30
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
30 rows affected.
order_date | order_item_product_id | revenue | last_product_id | last_revenue |
---|---|---|---|---|
2013-07-25 00:00:00 | 1004 | 5599.72 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 191 | 5099.49 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 957 | 4499.70 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 365 | 3359.44 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 1073 | 2999.85 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 1014 | 2798.88 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 403 | 1949.85 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 502 | 1650.00 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 627 | 1079.73 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 226 | 599.99 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 24 | 319.96 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 821 | 207.96 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 625 | 199.99 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 705 | 119.99 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 572 | 119.97 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 666 | 109.99 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 725 | 108.00 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 134 | 100.00 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 906 | 99.96 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 828 | 95.97 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 810 | 79.96 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 924 | 79.95 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 926 | 79.95 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 93 | 74.97 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 835 | 63.98 | 1004 | 5599.72 |
2013-07-25 00:00:00 | 897 | 49.98 | 1004 | 5599.72 |
2013-07-26 00:00:00 | 1004 | 10799.46 | 1004 | 10799.46 |
2013-07-26 00:00:00 | 365 | 7978.67 | 1004 | 10799.46 |
2013-07-26 00:00:00 | 957 | 6899.54 | 1004 | 10799.46 |
2013-07-26 00:00:00 | 191 | 6799.32 | 1004 | 10799.46 |