Cumulative or Moving Aggregations¶
Let us understand how we can take care of cumulative or moving aggregations using Analytic Functions.
When it comes to Windowing or Analytic Functions we can also specify window spec using
ROWS BETWEEN
clause.Even when we do not specify window spec, the default window spec is used. For most of the functions the default window spec is
UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
. You also have special clauses such asCURRENT ROW
.Here are some of the examples with respect to
ROWS BETWEEN
.ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
- moving aggregations using current record and previous 3 records.ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING
- moving aggregations using current record and following 3 records.ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
- moving aggregations based up on 7 records (current record, 3 previous records and 3 following records)
We can leverage
ROWS BETWEEN
for cumulative aggregations or moving aggregations.Here is an example of cumulative sum.
Warning
If you are using Jupyter based environment make sure to restart the kernel, as the session might have been already connected with retail database.
%load_ext sql
%env DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db
env: DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db
Note
Even though it is not mandatory to specify ORDER BY
as per syntax for cumulative aggregations, it is a must to specify. If not, you will end up getting incorrect results.
%%sql
SELECT e.employee_id, e.department_id, e.salary,
sum(e.salary) OVER (
PARTITION BY e.department_id
ORDER BY e.salary
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS sum_sal_expense
FROM employees e
ORDER BY e.department_id, e.salary DESC
LIMIT 10
10 rows affected.
employee_id | department_id | salary | sum_sal_expense |
---|---|---|---|
200 | 10 | 4400.00 | 4400.00 |
201 | 20 | 13000.00 | 19000.00 |
202 | 20 | 6000.00 | 6000.00 |
114 | 30 | 11000.00 | 24900.00 |
115 | 30 | 3100.00 | 13900.00 |
116 | 30 | 2900.00 | 10800.00 |
117 | 30 | 2800.00 | 7900.00 |
118 | 30 | 2600.00 | 5100.00 |
119 | 30 | 2500.00 | 2500.00 |
203 | 40 | 6500.00 | 6500.00 |
Warning
If you are using Jupyter based environment make sure to restart the kernel, as the session might have been already connected with hr database.
%load_ext sql
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
Note
Here is the example for cumulative sum for every month using daily_product_revenue in retail database.
%%sql
SELECT t.*,
round(sum(t.revenue) OVER (
PARTITION BY to_char(order_date, 'yyyy-MM')
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
), 2) AS cumulative_daily_revenue
FROM daily_revenue t
ORDER BY to_char(order_date, 'yyyy-MM'),
order_date
LIMIT 10
10 rows affected.
order_date | revenue | cumulative_daily_revenue |
---|---|---|
2013-07-25 00:00:00 | 31547.23 | 31547.23 |
2013-07-26 00:00:00 | 54713.23 | 86260.46 |
2013-07-27 00:00:00 | 48411.48 | 134671.94 |
2013-07-28 00:00:00 | 35672.03 | 170343.97 |
2013-07-29 00:00:00 | 54579.70 | 224923.67 |
2013-07-30 00:00:00 | 49329.29 | 274252.96 |
2013-07-31 00:00:00 | 59212.49 | 333465.45 |
2013-08-01 00:00:00 | 49160.08 | 49160.08 |
2013-08-02 00:00:00 | 50688.58 | 99848.66 |
2013-08-03 00:00:00 | 43416.74 | 143265.40 |
Note
Here are examples for 3 day moving sum as well as average using daily_revenue in retail database.
%%sql
SELECT t.*,
round(sum(t.revenue) OVER (
ORDER BY order_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
), 2) AS moving_3day_revenue
FROM daily_revenue t
ORDER BY order_date
LIMIT 20
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
20 rows affected.
order_date | revenue | moving_3day_revenue |
---|---|---|
2013-07-25 00:00:00 | 31547.23 | 31547.23 |
2013-07-26 00:00:00 | 54713.23 | 86260.46 |
2013-07-27 00:00:00 | 48411.48 | 134671.94 |
2013-07-28 00:00:00 | 35672.03 | 138796.74 |
2013-07-29 00:00:00 | 54579.70 | 138663.21 |
2013-07-30 00:00:00 | 49329.29 | 139581.02 |
2013-07-31 00:00:00 | 59212.49 | 163121.48 |
2013-08-01 00:00:00 | 49160.08 | 157701.86 |
2013-08-02 00:00:00 | 50688.58 | 159061.15 |
2013-08-03 00:00:00 | 43416.74 | 143265.40 |
2013-08-04 00:00:00 | 35093.01 | 129198.33 |
2013-08-05 00:00:00 | 34025.27 | 112535.02 |
2013-08-06 00:00:00 | 57843.89 | 126962.17 |
2013-08-07 00:00:00 | 45525.59 | 137394.75 |
2013-08-08 00:00:00 | 33549.47 | 136918.95 |
2013-08-09 00:00:00 | 29225.16 | 108300.22 |
2013-08-10 00:00:00 | 46435.04 | 109209.67 |
2013-08-11 00:00:00 | 31155.50 | 106815.70 |
2013-08-12 00:00:00 | 59014.74 | 136605.28 |
2013-08-13 00:00:00 | 17956.88 | 108127.12 |
%%sql
SELECT t.*,
round(sum(t.revenue) OVER (
ORDER BY order_date
ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
), 2) AS moving_3day_revenue
FROM daily_revenue t
ORDER BY order_date
LIMIT 20
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
20 rows affected.
order_date | revenue | moving_3day_revenue |
---|---|---|
2013-07-25 00:00:00 | 31547.23 | 134671.94 |
2013-07-26 00:00:00 | 54713.23 | 170343.97 |
2013-07-27 00:00:00 | 48411.48 | 224923.67 |
2013-07-28 00:00:00 | 35672.03 | 242705.73 |
2013-07-29 00:00:00 | 54579.70 | 247204.99 |
2013-07-30 00:00:00 | 49329.29 | 247953.59 |
2013-07-31 00:00:00 | 59212.49 | 262970.14 |
2013-08-01 00:00:00 | 49160.08 | 251807.18 |
2013-08-02 00:00:00 | 50688.58 | 237570.90 |
2013-08-03 00:00:00 | 43416.74 | 212383.68 |
2013-08-04 00:00:00 | 35093.01 | 221067.49 |
2013-08-05 00:00:00 | 34025.27 | 215904.50 |
2013-08-06 00:00:00 | 57843.89 | 206037.23 |
2013-08-07 00:00:00 | 45525.59 | 200169.38 |
2013-08-08 00:00:00 | 33549.47 | 212579.15 |
2013-08-09 00:00:00 | 29225.16 | 185890.76 |
2013-08-10 00:00:00 | 46435.04 | 199379.91 |
2013-08-11 00:00:00 | 31155.50 | 183787.32 |
2013-08-12 00:00:00 | 59014.74 | 196605.61 |
2013-08-13 00:00:00 | 17956.88 | 199737.25 |
%%sql
SELECT t.*,
round(avg(t.revenue) OVER (
ORDER BY order_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
), 2) AS moving_3day_revenue
FROM daily_revenue t
ORDER BY order_date
LIMIT 20
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
20 rows affected.
order_date | revenue | moving_3day_revenue |
---|---|---|
2013-07-25 00:00:00 | 31547.23 | 31547.23 |
2013-07-26 00:00:00 | 54713.23 | 43130.23 |
2013-07-27 00:00:00 | 48411.48 | 44890.65 |
2013-07-28 00:00:00 | 35672.03 | 46265.58 |
2013-07-29 00:00:00 | 54579.70 | 46221.07 |
2013-07-30 00:00:00 | 49329.29 | 46527.01 |
2013-07-31 00:00:00 | 59212.49 | 54373.83 |
2013-08-01 00:00:00 | 49160.08 | 52567.29 |
2013-08-02 00:00:00 | 50688.58 | 53020.38 |
2013-08-03 00:00:00 | 43416.74 | 47755.13 |
2013-08-04 00:00:00 | 35093.01 | 43066.11 |
2013-08-05 00:00:00 | 34025.27 | 37511.67 |
2013-08-06 00:00:00 | 57843.89 | 42320.72 |
2013-08-07 00:00:00 | 45525.59 | 45798.25 |
2013-08-08 00:00:00 | 33549.47 | 45639.65 |
2013-08-09 00:00:00 | 29225.16 | 36100.07 |
2013-08-10 00:00:00 | 46435.04 | 36403.22 |
2013-08-11 00:00:00 | 31155.50 | 35605.23 |
2013-08-12 00:00:00 | 59014.74 | 45535.09 |
2013-08-13 00:00:00 | 17956.88 | 36042.37 |