Analytic Functions – Ranking¶
Let us see how we can assign ranks using different rank functions.
If we have to assign ranks globally, we just need to specify ORDER BY
If we have to assign ranks with in a key then we need to specify PARTITION BY and then ORDER BY.
By default ORDER BY will sort the data in ascending order. We can change the order by passing DESC after order by.
We have 3 main functions to assign ranks -
rank
,dense_rank
androw_number
. We will see the differences between the 3 in a moment.
%load_ext sql
The sql extension is already loaded. To reload it, use:
%reload_ext sql
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
Note
Here is an example to assign sparse ranks using daily_product_revenue with in each day based on revenue.
%%sql
SELECT t.*,
rank() OVER (
PARTITION BY order_date
ORDER BY revenue DESC
) AS rnk
FROM daily_product_revenue t
ORDER BY order_date, revenue DESC
LIMIT 30
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
30 rows affected.
order_date | order_item_product_id | revenue | rnk |
---|---|---|---|
2013-07-25 00:00:00 | 1004 | 5599.72 | 1 |
2013-07-25 00:00:00 | 191 | 5099.49 | 2 |
2013-07-25 00:00:00 | 957 | 4499.70 | 3 |
2013-07-25 00:00:00 | 365 | 3359.44 | 4 |
2013-07-25 00:00:00 | 1073 | 2999.85 | 5 |
2013-07-25 00:00:00 | 1014 | 2798.88 | 6 |
2013-07-25 00:00:00 | 403 | 1949.85 | 7 |
2013-07-25 00:00:00 | 502 | 1650.00 | 8 |
2013-07-25 00:00:00 | 627 | 1079.73 | 9 |
2013-07-25 00:00:00 | 226 | 599.99 | 10 |
2013-07-25 00:00:00 | 24 | 319.96 | 11 |
2013-07-25 00:00:00 | 821 | 207.96 | 12 |
2013-07-25 00:00:00 | 625 | 199.99 | 13 |
2013-07-25 00:00:00 | 705 | 119.99 | 14 |
2013-07-25 00:00:00 | 572 | 119.97 | 15 |
2013-07-25 00:00:00 | 666 | 109.99 | 16 |
2013-07-25 00:00:00 | 725 | 108.00 | 17 |
2013-07-25 00:00:00 | 134 | 100.00 | 18 |
2013-07-25 00:00:00 | 906 | 99.96 | 19 |
2013-07-25 00:00:00 | 828 | 95.97 | 20 |
2013-07-25 00:00:00 | 810 | 79.96 | 21 |
2013-07-25 00:00:00 | 924 | 79.95 | 22 |
2013-07-25 00:00:00 | 926 | 79.95 | 22 |
2013-07-25 00:00:00 | 93 | 74.97 | 24 |
2013-07-25 00:00:00 | 835 | 63.98 | 25 |
2013-07-25 00:00:00 | 897 | 49.98 | 26 |
2013-07-26 00:00:00 | 1004 | 10799.46 | 1 |
2013-07-26 00:00:00 | 365 | 7978.67 | 2 |
2013-07-26 00:00:00 | 957 | 6899.54 | 3 |
2013-07-26 00:00:00 | 191 | 6799.32 | 4 |
Note
Here is another example to assign sparse ranks using employees data set with in each department. Make sure to restart kernel as you might have connected to retail database.
%load_ext sql
%env DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db
env: DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db
%%sql
SELECT employee_id, department_id, salary FROM employees
ORDER BY department_id,
salary DESC
LIMIT 10
10 rows affected.
employee_id | department_id | salary |
---|---|---|
200 | 10 | 4400.00 |
201 | 20 | 13000.00 |
202 | 20 | 6000.00 |
114 | 30 | 11000.00 |
115 | 30 | 3100.00 |
116 | 30 | 2900.00 |
117 | 30 | 2800.00 |
118 | 30 | 2600.00 |
119 | 30 | 2500.00 |
203 | 40 | 6500.00 |
%%sql
SELECT employee_id, department_id, salary,
rank() OVER (
PARTITION BY department_id
ORDER BY salary DESC
) AS rnk
FROM employees
LIMIT 20
* postgresql://itversity_hr_user:***@localhost:5432/itversity_hr_db
20 rows affected.
employee_id | department_id | salary | rnk |
---|---|---|---|
200 | 10 | 4400.00 | 1 |
201 | 20 | 13000.00 | 1 |
202 | 20 | 6000.00 | 2 |
114 | 30 | 11000.00 | 1 |
115 | 30 | 3100.00 | 2 |
116 | 30 | 2900.00 | 3 |
117 | 30 | 2800.00 | 4 |
118 | 30 | 2600.00 | 5 |
119 | 30 | 2500.00 | 6 |
203 | 40 | 6500.00 | 1 |
121 | 50 | 8200.00 | 1 |
120 | 50 | 8000.00 | 2 |
122 | 50 | 7900.00 | 3 |
123 | 50 | 6500.00 | 4 |
124 | 50 | 5800.00 | 5 |
184 | 50 | 4200.00 | 6 |
185 | 50 | 4100.00 | 7 |
192 | 50 | 4000.00 | 8 |
193 | 50 | 3900.00 | 9 |
188 | 50 | 3800.00 | 10 |
Note
Here is an example to assign dense ranks using employees data set with in each department.
%%sql
SELECT employee_id, department_id, salary,
dense_rank() OVER (
PARTITION BY department_id
ORDER BY salary DESC
) AS drnk
FROM employees
LIMIT 20
* postgresql://itversity_hr_user:***@localhost:5432/itversity_hr_db
20 rows affected.
employee_id | department_id | salary | drnk |
---|---|---|---|
200 | 10 | 4400.00 | 1 |
201 | 20 | 13000.00 | 1 |
202 | 20 | 6000.00 | 2 |
114 | 30 | 11000.00 | 1 |
115 | 30 | 3100.00 | 2 |
116 | 30 | 2900.00 | 3 |
117 | 30 | 2800.00 | 4 |
118 | 30 | 2600.00 | 5 |
119 | 30 | 2500.00 | 6 |
203 | 40 | 6500.00 | 1 |
121 | 50 | 8200.00 | 1 |
120 | 50 | 8000.00 | 2 |
122 | 50 | 7900.00 | 3 |
123 | 50 | 6500.00 | 4 |
124 | 50 | 5800.00 | 5 |
184 | 50 | 4200.00 | 6 |
185 | 50 | 4100.00 | 7 |
192 | 50 | 4000.00 | 8 |
193 | 50 | 3900.00 | 9 |
188 | 50 | 3800.00 | 10 |
Note
Here is an example for global rank based on salary. If all the salaries are unique, we can use LIMIT
but when they are not unique, we have to go with analytic functions.
%%sql
SELECT employee_id, department_id, salary,
rank() OVER (
ORDER BY salary DESC
) AS rnk,
dense_rank() OVER (
ORDER BY salary DESC
) AS drnk
FROM employees
LIMIT 20
* postgresql://itversity_hr_user:***@localhost:5432/itversity_hr_db
20 rows affected.
employee_id | department_id | salary | rnk | drnk |
---|---|---|---|---|
100 | 90 | 24000.00 | 1 | 1 |
101 | 90 | 17000.00 | 2 | 2 |
102 | 90 | 17000.00 | 2 | 2 |
145 | 80 | 14000.00 | 4 | 3 |
146 | 80 | 13500.00 | 5 | 4 |
201 | 20 | 13000.00 | 6 | 5 |
205 | 110 | 12000.00 | 7 | 6 |
147 | 80 | 12000.00 | 7 | 6 |
108 | 100 | 12000.00 | 7 | 6 |
168 | 80 | 11500.00 | 10 | 7 |
148 | 80 | 11000.00 | 11 | 8 |
174 | 80 | 11000.00 | 11 | 8 |
114 | 30 | 11000.00 | 11 | 8 |
149 | 80 | 10500.00 | 14 | 9 |
162 | 80 | 10500.00 | 14 | 9 |
169 | 80 | 10000.00 | 16 | 10 |
204 | 70 | 10000.00 | 16 | 10 |
150 | 80 | 10000.00 | 16 | 10 |
156 | 80 | 10000.00 | 16 | 10 |
170 | 80 | 9600.00 | 20 | 11 |
Let us understand the difference between rank, dense_rank and row_number.
We can use either of the functions to generate ranks when the rank field does not have duplicates.
When rank field have duplicates then row_number should not be used as it generate unique number for each record with in the partition.
rank will skip the ranks in between if multiple people get the same rank while dense_rank continue with the next number.
%load_ext sql
The sql extension is already loaded. To reload it, use:
%reload_ext sql
%env DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db
env: DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db
%%sql
SELECT
employee_id,
department_id,
salary,
rank() OVER (
PARTITION BY department_id
ORDER BY salary DESC
) rnk,
dense_rank() OVER (
PARTITION BY department_id
ORDER BY salary DESC
) drnk,
row_number() OVER (
PARTITION BY department_id
ORDER BY salary DESC, employee_id
) rn
FROM employees
ORDER BY department_id, salary DESC
LIMIT 50
* postgresql://itversity_hr_user:***@localhost:5432/itversity_hr_db
50 rows affected.
employee_id | department_id | salary | rnk | drnk | rn |
---|---|---|---|---|---|
200 | 10 | 4400.00 | 1 | 1 | 1 |
201 | 20 | 13000.00 | 1 | 1 | 1 |
202 | 20 | 6000.00 | 2 | 2 | 2 |
114 | 30 | 11000.00 | 1 | 1 | 1 |
115 | 30 | 3100.00 | 2 | 2 | 2 |
116 | 30 | 2900.00 | 3 | 3 | 3 |
117 | 30 | 2800.00 | 4 | 4 | 4 |
118 | 30 | 2600.00 | 5 | 5 | 5 |
119 | 30 | 2500.00 | 6 | 6 | 6 |
203 | 40 | 6500.00 | 1 | 1 | 1 |
121 | 50 | 8200.00 | 1 | 1 | 1 |
120 | 50 | 8000.00 | 2 | 2 | 2 |
122 | 50 | 7900.00 | 3 | 3 | 3 |
123 | 50 | 6500.00 | 4 | 4 | 4 |
124 | 50 | 5800.00 | 5 | 5 | 5 |
184 | 50 | 4200.00 | 6 | 6 | 6 |
185 | 50 | 4100.00 | 7 | 7 | 7 |
192 | 50 | 4000.00 | 8 | 8 | 8 |
193 | 50 | 3900.00 | 9 | 9 | 9 |
188 | 50 | 3800.00 | 10 | 10 | 10 |
137 | 50 | 3600.00 | 11 | 11 | 11 |
189 | 50 | 3600.00 | 11 | 11 | 12 |
141 | 50 | 3500.00 | 13 | 12 | 13 |
186 | 50 | 3400.00 | 14 | 13 | 14 |
129 | 50 | 3300.00 | 15 | 14 | 15 |
133 | 50 | 3300.00 | 15 | 14 | 16 |
125 | 50 | 3200.00 | 17 | 15 | 17 |
138 | 50 | 3200.00 | 17 | 15 | 18 |
180 | 50 | 3200.00 | 17 | 15 | 19 |
194 | 50 | 3200.00 | 17 | 15 | 20 |
142 | 50 | 3100.00 | 21 | 16 | 21 |
181 | 50 | 3100.00 | 21 | 16 | 22 |
196 | 50 | 3100.00 | 21 | 16 | 23 |
187 | 50 | 3000.00 | 24 | 17 | 24 |
197 | 50 | 3000.00 | 24 | 17 | 25 |
134 | 50 | 2900.00 | 26 | 18 | 26 |
190 | 50 | 2900.00 | 26 | 18 | 27 |
130 | 50 | 2800.00 | 28 | 19 | 28 |
183 | 50 | 2800.00 | 28 | 19 | 29 |
195 | 50 | 2800.00 | 28 | 19 | 30 |
126 | 50 | 2700.00 | 31 | 20 | 31 |
139 | 50 | 2700.00 | 31 | 20 | 32 |
143 | 50 | 2600.00 | 33 | 21 | 33 |
198 | 50 | 2600.00 | 33 | 21 | 34 |
199 | 50 | 2600.00 | 33 | 21 | 35 |
131 | 50 | 2500.00 | 36 | 22 | 36 |
140 | 50 | 2500.00 | 36 | 22 | 37 |
144 | 50 | 2500.00 | 36 | 22 | 38 |
182 | 50 | 2500.00 | 36 | 22 | 39 |
191 | 50 | 2500.00 | 36 | 22 | 40 |
Note
Here is another example to with respect to all 3 functions. Make sure to restart kernel as you might have connected to HR database.
%load_ext sql
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
%%sql
SELECT
t.*,
rank() OVER (
PARTITION BY order_date
ORDER BY revenue DESC
) rnk,
dense_rank() OVER (
PARTITION BY order_date
ORDER BY revenue DESC
) drnk,
row_number() OVER (
PARTITION BY order_date
ORDER BY revenue DESC
) rn
FROM daily_product_revenue AS t
ORDER BY order_date, revenue DESC
LIMIT 30
30 rows affected.
order_date | order_item_product_id | revenue | rnk | drnk | rn |
---|---|---|---|---|---|
2013-07-25 00:00:00 | 1004 | 5599.72 | 1 | 1 | 1 |
2013-07-25 00:00:00 | 191 | 5099.49 | 2 | 2 | 2 |
2013-07-25 00:00:00 | 957 | 4499.70 | 3 | 3 | 3 |
2013-07-25 00:00:00 | 365 | 3359.44 | 4 | 4 | 4 |
2013-07-25 00:00:00 | 1073 | 2999.85 | 5 | 5 | 5 |
2013-07-25 00:00:00 | 1014 | 2798.88 | 6 | 6 | 6 |
2013-07-25 00:00:00 | 403 | 1949.85 | 7 | 7 | 7 |
2013-07-25 00:00:00 | 502 | 1650.00 | 8 | 8 | 8 |
2013-07-25 00:00:00 | 627 | 1079.73 | 9 | 9 | 9 |
2013-07-25 00:00:00 | 226 | 599.99 | 10 | 10 | 10 |
2013-07-25 00:00:00 | 24 | 319.96 | 11 | 11 | 11 |
2013-07-25 00:00:00 | 821 | 207.96 | 12 | 12 | 12 |
2013-07-25 00:00:00 | 625 | 199.99 | 13 | 13 | 13 |
2013-07-25 00:00:00 | 705 | 119.99 | 14 | 14 | 14 |
2013-07-25 00:00:00 | 572 | 119.97 | 15 | 15 | 15 |
2013-07-25 00:00:00 | 666 | 109.99 | 16 | 16 | 16 |
2013-07-25 00:00:00 | 725 | 108.00 | 17 | 17 | 17 |
2013-07-25 00:00:00 | 134 | 100.00 | 18 | 18 | 18 |
2013-07-25 00:00:00 | 906 | 99.96 | 19 | 19 | 19 |
2013-07-25 00:00:00 | 828 | 95.97 | 20 | 20 | 20 |
2013-07-25 00:00:00 | 810 | 79.96 | 21 | 21 | 21 |
2013-07-25 00:00:00 | 924 | 79.95 | 22 | 22 | 22 |
2013-07-25 00:00:00 | 926 | 79.95 | 22 | 22 | 23 |
2013-07-25 00:00:00 | 93 | 74.97 | 24 | 23 | 24 |
2013-07-25 00:00:00 | 835 | 63.98 | 25 | 24 | 25 |
2013-07-25 00:00:00 | 897 | 49.98 | 26 | 25 | 26 |
2013-07-26 00:00:00 | 1004 | 10799.46 | 1 | 1 | 1 |
2013-07-26 00:00:00 | 365 | 7978.67 | 2 | 2 | 2 |
2013-07-26 00:00:00 | 957 | 6899.54 | 3 | 3 | 3 |
2013-07-26 00:00:00 | 191 | 6799.32 | 4 | 4 | 4 |