Analytic Functions – Ranking¶

Let us see how we can assign ranks using different rank functions.

If we have to assign ranks globally, we just need to specify ORDER BY
If we have to assign ranks with in a key then we need to specify PARTITION BY and then ORDER BY.
By default ORDER BY will sort the data in ascending order. We can change the order by passing DESC after order by.
We have 3 main functions to assign ranks - rank, dense_rank and row_number. We will see the differences between the 3 in a moment.

%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql

%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

Note

Here is an example to assign sparse ranks using daily_product_revenue with in each day based on revenue.

%%sql

SELECT t.*,
    rank() OVER (
        PARTITION BY order_date
        ORDER BY revenue DESC
    ) AS rnk
FROM daily_product_revenue t
ORDER BY order_date, revenue DESC
LIMIT 30

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
30 rows affected.

order_date	order_item_product_id	revenue	rnk
2013-07-25 00:00:00	1004	5599.72	1
2013-07-25 00:00:00	191	5099.49	2
2013-07-25 00:00:00	957	4499.70	3
2013-07-25 00:00:00	365	3359.44	4
2013-07-25 00:00:00	1073	2999.85	5
2013-07-25 00:00:00	1014	2798.88	6
2013-07-25 00:00:00	403	1949.85	7
2013-07-25 00:00:00	502	1650.00	8
2013-07-25 00:00:00	627	1079.73	9
2013-07-25 00:00:00	226	599.99	10
2013-07-25 00:00:00	24	319.96	11
2013-07-25 00:00:00	821	207.96	12
2013-07-25 00:00:00	625	199.99	13
2013-07-25 00:00:00	705	119.99	14
2013-07-25 00:00:00	572	119.97	15
2013-07-25 00:00:00	666	109.99	16
2013-07-25 00:00:00	725	108.00	17
2013-07-25 00:00:00	134	100.00	18
2013-07-25 00:00:00	906	99.96	19
2013-07-25 00:00:00	828	95.97	20
2013-07-25 00:00:00	810	79.96	21
2013-07-25 00:00:00	924	79.95	22
2013-07-25 00:00:00	926	79.95	22
2013-07-25 00:00:00	93	74.97	24
2013-07-25 00:00:00	835	63.98	25
2013-07-25 00:00:00	897	49.98	26
2013-07-26 00:00:00	1004	10799.46	1
2013-07-26 00:00:00	365	7978.67	2
2013-07-26 00:00:00	957	6899.54	3
2013-07-26 00:00:00	191	6799.32	4

Note

Here is another example to assign sparse ranks using employees data set with in each department. Make sure to restart kernel as you might have connected to retail database.

%load_ext sql

%env DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db

env: DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db

%%sql

SELECT employee_id, department_id, salary FROM employees 
ORDER BY department_id,
    salary DESC
LIMIT 10

10 rows affected.

employee_id	department_id	salary
200	10	4400.00
201	20	13000.00
202	20	6000.00
114	30	11000.00
115	30	3100.00
116	30	2900.00
117	30	2800.00
118	30	2600.00
119	30	2500.00
203	40	6500.00

%%sql

SELECT employee_id, department_id, salary,
    rank() OVER (
        PARTITION BY department_id 
        ORDER BY salary DESC
    ) AS rnk
FROM employees
LIMIT 20

 * postgresql://itversity_hr_user:***@localhost:5432/itversity_hr_db
20 rows affected.

employee_id	department_id	salary	rnk
200	10	4400.00	1
201	20	13000.00	1
202	20	6000.00	2
114	30	11000.00	1
115	30	3100.00	2
116	30	2900.00	3
117	30	2800.00	4
118	30	2600.00	5
119	30	2500.00	6
203	40	6500.00	1
121	50	8200.00	1
120	50	8000.00	2
122	50	7900.00	3
123	50	6500.00	4
124	50	5800.00	5
184	50	4200.00	6
185	50	4100.00	7
192	50	4000.00	8
193	50	3900.00	9
188	50	3800.00	10

Note

Here is an example to assign dense ranks using employees data set with in each department.

%%sql

SELECT employee_id, department_id, salary,
    dense_rank() OVER (
        PARTITION BY department_id 
        ORDER BY salary DESC
    ) AS drnk
FROM employees
LIMIT 20

 * postgresql://itversity_hr_user:***@localhost:5432/itversity_hr_db
20 rows affected.

employee_id	department_id	salary	drnk
200	10	4400.00	1
201	20	13000.00	1
202	20	6000.00	2
114	30	11000.00	1
115	30	3100.00	2
116	30	2900.00	3
117	30	2800.00	4
118	30	2600.00	5
119	30	2500.00	6
203	40	6500.00	1
121	50	8200.00	1
120	50	8000.00	2
122	50	7900.00	3
123	50	6500.00	4
124	50	5800.00	5
184	50	4200.00	6
185	50	4100.00	7
192	50	4000.00	8
193	50	3900.00	9
188	50	3800.00	10

Note

Here is an example for global rank based on salary. If all the salaries are unique, we can use LIMIT but when they are not unique, we have to go with analytic functions.

%%sql

SELECT employee_id, department_id, salary,
    rank() OVER (
        ORDER BY salary DESC
    ) AS rnk,
    dense_rank() OVER (
        ORDER BY salary DESC
    ) AS drnk
FROM employees
LIMIT 20

 * postgresql://itversity_hr_user:***@localhost:5432/itversity_hr_db
20 rows affected.

employee_id	department_id	salary	rnk	drnk
100	90	24000.00	1	1
101	90	17000.00	2	2
102	90	17000.00	2	2
145	80	14000.00	4	3
146	80	13500.00	5	4
201	20	13000.00	6	5
205	110	12000.00	7	6
147	80	12000.00	7	6
108	100	12000.00	7	6
168	80	11500.00	10	7
148	80	11000.00	11	8
174	80	11000.00	11	8
114	30	11000.00	11	8
149	80	10500.00	14	9
162	80	10500.00	14	9
169	80	10000.00	16	10
204	70	10000.00	16	10
150	80	10000.00	16	10
156	80	10000.00	16	10
170	80	9600.00	20	11

Let us understand the difference between rank, dense_rank and row_number.

We can use either of the functions to generate ranks when the rank field does not have duplicates.
When rank field have duplicates then row_number should not be used as it generate unique number for each record with in the partition.
rank will skip the ranks in between if multiple people get the same rank while dense_rank continue with the next number.

%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql

%env DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db

env: DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db

%%sql

SELECT
    employee_id,
    department_id,
    salary,
    rank() OVER (
        PARTITION BY department_id
        ORDER BY salary DESC
      ) rnk,
    dense_rank() OVER (
        PARTITION BY department_id
        ORDER BY salary DESC
      ) drnk,
    row_number() OVER (
        PARTITION BY department_id
        ORDER BY salary DESC, employee_id
      ) rn
FROM employees
ORDER BY department_id, salary DESC
LIMIT 50

 * postgresql://itversity_hr_user:***@localhost:5432/itversity_hr_db
50 rows affected.

employee_id	department_id	salary	rnk	drnk	rn
200	10	4400.00	1	1	1
201	20	13000.00	1	1	1
202	20	6000.00	2	2	2
114	30	11000.00	1	1	1
115	30	3100.00	2	2	2
116	30	2900.00	3	3	3
117	30	2800.00	4	4	4
118	30	2600.00	5	5	5
119	30	2500.00	6	6	6
203	40	6500.00	1	1	1
121	50	8200.00	1	1	1
120	50	8000.00	2	2	2
122	50	7900.00	3	3	3
123	50	6500.00	4	4	4
124	50	5800.00	5	5	5
184	50	4200.00	6	6	6
185	50	4100.00	7	7	7
192	50	4000.00	8	8	8
193	50	3900.00	9	9	9
188	50	3800.00	10	10	10
137	50	3600.00	11	11	11
189	50	3600.00	11	11	12
141	50	3500.00	13	12	13
186	50	3400.00	14	13	14
129	50	3300.00	15	14	15
133	50	3300.00	15	14	16
125	50	3200.00	17	15	17
138	50	3200.00	17	15	18
180	50	3200.00	17	15	19
194	50	3200.00	17	15	20
142	50	3100.00	21	16	21
181	50	3100.00	21	16	22
196	50	3100.00	21	16	23
187	50	3000.00	24	17	24
197	50	3000.00	24	17	25
134	50	2900.00	26	18	26
190	50	2900.00	26	18	27
130	50	2800.00	28	19	28
183	50	2800.00	28	19	29
195	50	2800.00	28	19	30
126	50	2700.00	31	20	31
139	50	2700.00	31	20	32
143	50	2600.00	33	21	33
198	50	2600.00	33	21	34
199	50	2600.00	33	21	35
131	50	2500.00	36	22	36
140	50	2500.00	36	22	37
144	50	2500.00	36	22	38
182	50	2500.00	36	22	39
191	50	2500.00	36	22	40

Note

Here is another example to with respect to all 3 functions. Make sure to restart kernel as you might have connected to HR database.

%load_ext sql

%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

%%sql

SELECT
    t.*,
    rank() OVER (
        PARTITION BY order_date
        ORDER BY revenue DESC
    ) rnk,
    dense_rank() OVER (
        PARTITION BY order_date
        ORDER BY revenue DESC
    ) drnk,
    row_number() OVER (
        PARTITION BY order_date
        ORDER BY revenue DESC
    ) rn
FROM daily_product_revenue AS t
ORDER BY order_date, revenue DESC
LIMIT 30

30 rows affected.

order_date	order_item_product_id	revenue	rnk	drnk	rn
2013-07-25 00:00:00	1004	5599.72	1	1	1
2013-07-25 00:00:00	191	5099.49	2	2	2
2013-07-25 00:00:00	957	4499.70	3	3	3
2013-07-25 00:00:00	365	3359.44	4	4	4
2013-07-25 00:00:00	1073	2999.85	5	5	5
2013-07-25 00:00:00	1014	2798.88	6	6	6
2013-07-25 00:00:00	403	1949.85	7	7	7
2013-07-25 00:00:00	502	1650.00	8	8	8
2013-07-25 00:00:00	627	1079.73	9	9	9
2013-07-25 00:00:00	226	599.99	10	10	10
2013-07-25 00:00:00	24	319.96	11	11	11
2013-07-25 00:00:00	821	207.96	12	12	12
2013-07-25 00:00:00	625	199.99	13	13	13
2013-07-25 00:00:00	705	119.99	14	14	14
2013-07-25 00:00:00	572	119.97	15	15	15
2013-07-25 00:00:00	666	109.99	16	16	16
2013-07-25 00:00:00	725	108.00	17	17	17
2013-07-25 00:00:00	134	100.00	18	18	18
2013-07-25 00:00:00	906	99.96	19	19	19
2013-07-25 00:00:00	828	95.97	20	20	20
2013-07-25 00:00:00	810	79.96	21	21	21
2013-07-25 00:00:00	924	79.95	22	22	22
2013-07-25 00:00:00	926	79.95	22	22	23
2013-07-25 00:00:00	93	74.97	24	23	24
2013-07-25 00:00:00	835	63.98	25	24	25
2013-07-25 00:00:00	897	49.98	26	25	26
2013-07-26 00:00:00	1004	10799.46	1	1	1
2013-07-26 00:00:00	365	7978.67	2	2	2
2013-07-26 00:00:00	957	6899.54	3	3	3
2013-07-26 00:00:00	191	6799.32	4	4	4

Mastering SQL using Postgresql

Analytic Functions – Ranking¶