Exercises - Analytics Functions¶
Let us take care of the exercises related to analytics functions. We will be using HR database for the same.
Get all the employees who is making more than average salary with in each department.
Get cumulative salary for one of the department along with department name.
Get top 3 paid employees with in each department by salary (use dense_rank)
Get top 3 products sold in the month of 2014 January by revenue.
Get top 3 products in each category sold in the month of 2014 January by revenue.
Prepare HR Database¶
Here are the steps to prepare HR database.
Connect to HR DB using
psql
or SQL Workbench. Here is the samplepsql
command.
psql -h localhost \
-p 5432 \
-d itversity_hr_db \
-U itversity_hr_user \
-W
Run scripts to create tables and load the data. You can also drop the tables if they already exists.
\i /data/hr_db/drop_tables_pg.sql
\i /data/hr_db/create_tables_pg.sql
\i /data/hr_db/load_data_pg.sql
Validate to ensure that data is available in the tables by running these queries.
%load_ext sql
%env DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db
env: DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db
%sql SELECT * FROM employees LIMIT 10
10 rows affected.
employee_id | first_name | last_name | phone_number | hire_date | job_id | salary | commission_pct | manager_id | department_id | |
---|---|---|---|---|---|---|---|---|---|---|
100 | Steven | King | SKING | 515.123.4567 | 1987-06-17 | AD_PRES | 24000.00 | None | None | 90 |
101 | Neena | Kochhar | NKOCHHAR | 515.123.4568 | 1989-09-21 | AD_VP | 17000.00 | None | 100 | 90 |
102 | Lex | De Haan | LDEHAAN | 515.123.4569 | 1993-01-13 | AD_VP | 17000.00 | None | 100 | 90 |
103 | Alexander | Hunold | AHUNOLD | 590.423.4567 | 1990-01-03 | IT_PROG | 9000.00 | None | 102 | 60 |
104 | Bruce | Ernst | BERNST | 590.423.4568 | 1991-05-21 | IT_PROG | 6000.00 | None | 103 | 60 |
105 | David | Austin | DAUSTIN | 590.423.4569 | 1997-06-25 | IT_PROG | 4800.00 | None | 103 | 60 |
106 | Valli | Pataballa | VPATABAL | 590.423.4560 | 1998-02-05 | IT_PROG | 4800.00 | None | 103 | 60 |
107 | Diana | Lorentz | DLORENTZ | 590.423.5567 | 1999-02-07 | IT_PROG | 4200.00 | None | 103 | 60 |
108 | Nancy | Greenberg | NGREENBE | 515.124.4569 | 1994-08-17 | FI_MGR | 12000.00 | None | 101 | 100 |
109 | Daniel | Faviet | DFAVIET | 515.124.4169 | 1994-08-16 | FI_ACCOUNT | 9000.00 | None | 108 | 100 |
%%sql
SELECT * FROM departments
ORDER BY manager_id NULLS LAST
LIMIT 10
* postgresql://itversity_hr_user:***@localhost:5432/itversity_hr_db
(psycopg2.errors.UndefinedTable) relation "departments" does not exist
LINE 1: SELECT * FROM departments
^
[SQL: SELECT * FROM departments
ORDER BY manager_id NULLS LAST
LIMIT 10]
(Background on this error at: http://sqlalche.me/e/13/f405)
Prepare Retail Database¶
Make sure to drop and recreate the tables before taking care of the exercises related to retail database.
Ensure that we have required database and user for retail data. We might provide the database as part of our labs. Here are the instructions to use
psql
for setting up the required database (if required) and tables.
psql -U postgres -h localhost -p 5432 -W
CREATE DATABASE itversity_retail_db;
CREATE USER itversity_retail_user WITH ENCRYPTED PASSWORD 'retail_password';
GRANT ALL ON DATABASE itversity_retail_db TO itversity_retail_user;
Create Tables using the script provided. You can either use
psql
or SQL Workbench.
psql -U itversity_retail_user \
-h localhost \
-p 5432 \
-d itversity_retail_db \
-W
You can drop the existing tables.
DROP TABLE IF EXISTS order_items CASCADE;
DROP TABLE IF EXISTS orders CASCADE;
DROP TABLE IF EXISTS customers CASCADE;
DROP TABLE IF EXISTS products CASCADE;
DROP TABLE IF EXISTS categories CASCADE;
DROP TABLE IF EXISTS departments CASCADE;
Once the tables are dropped you can run below script to create the tables for the purpose of exercises.
\i /data/retail_db/create_db_tables_pg.sql
Data shall be loaded using the script provided.
\i /data/retail_db/load_db_tables_pg.sql
Run queries to validate we have data in all the 3 tables.
%load_ext sql
The sql extension is already loaded. To reload it, use:
%reload_ext sql
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
%sql SELECT * FROM orders LIMIT 10
10 rows affected.
order_id | order_date | order_customer_id | order_status |
---|---|---|---|
1 | 2013-07-25 00:00:00 | 11599 | CLOSED |
2 | 2013-07-25 00:00:00 | 256 | PENDING_PAYMENT |
3 | 2013-07-25 00:00:00 | 12111 | COMPLETE |
4 | 2013-07-25 00:00:00 | 8827 | CLOSED |
5 | 2013-07-25 00:00:00 | 11318 | COMPLETE |
6 | 2013-07-25 00:00:00 | 7130 | COMPLETE |
7 | 2013-07-25 00:00:00 | 4530 | COMPLETE |
8 | 2013-07-25 00:00:00 | 2911 | PROCESSING |
9 | 2013-07-25 00:00:00 | 5657 | PENDING_PAYMENT |
10 | 2013-07-25 00:00:00 | 5648 | PENDING_PAYMENT |
%sql SELECT count(1) FROM orders
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.
count |
---|
68883 |
%sql SELECT * FROM order_items LIMIT 10
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.
order_item_id | order_item_order_id | order_item_product_id | order_item_quantity | order_item_subtotal | order_item_product_price |
---|---|---|---|---|---|
1 | 1 | 957 | 1 | 299.98 | 299.98 |
2 | 2 | 1073 | 1 | 199.99 | 199.99 |
3 | 2 | 502 | 5 | 250.0 | 50.0 |
4 | 2 | 403 | 1 | 129.99 | 129.99 |
5 | 4 | 897 | 2 | 49.98 | 24.99 |
6 | 4 | 365 | 5 | 299.95 | 59.99 |
7 | 4 | 502 | 3 | 150.0 | 50.0 |
8 | 4 | 1014 | 4 | 199.92 | 49.98 |
9 | 5 | 957 | 1 | 299.98 | 299.98 |
10 | 5 | 365 | 5 | 299.95 | 59.99 |
%sql SELECT count(1) FROM order_items
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.
count |
---|
172198 |
%sql SELECT * FROM products LIMIT 10
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.
product_id | product_category_id | product_name | product_description | product_price | product_image |
---|---|---|---|---|---|
1 | 2 | Quest Q64 10 FT. x 10 FT. Slant Leg Instant U | 59.98 | http://images.acmesports.sports/Quest+Q64+10+FT.+x+10+FT.+Slant+Leg+Instant+Up+Canopy | |
2 | 2 | Under Armour Men's Highlight MC Football Clea | 129.99 | http://images.acmesports.sports/Under+Armour+Men%27s+Highlight+MC+Football+Cleat | |
3 | 2 | Under Armour Men's Renegade D Mid Football Cl | 89.99 | http://images.acmesports.sports/Under+Armour+Men%27s+Renegade+D+Mid+Football+Cleat | |
4 | 2 | Under Armour Men's Renegade D Mid Football Cl | 89.99 | http://images.acmesports.sports/Under+Armour+Men%27s+Renegade+D+Mid+Football+Cleat | |
5 | 2 | Riddell Youth Revolution Speed Custom Footbal | 199.99 | http://images.acmesports.sports/Riddell+Youth+Revolution+Speed+Custom+Football+Helmet | |
6 | 2 | Jordan Men's VI Retro TD Football Cleat | 134.99 | http://images.acmesports.sports/Jordan+Men%27s+VI+Retro+TD+Football+Cleat | |
7 | 2 | Schutt Youth Recruit Hybrid Custom Football H | 99.99 | http://images.acmesports.sports/Schutt+Youth+Recruit+Hybrid+Custom+Football+Helmet+2014 | |
8 | 2 | Nike Men's Vapor Carbon Elite TD Football Cle | 129.99 | http://images.acmesports.sports/Nike+Men%27s+Vapor+Carbon+Elite+TD+Football+Cleat | |
9 | 2 | Nike Adult Vapor Jet 3.0 Receiver Gloves | 50.0 | http://images.acmesports.sports/Nike+Adult+Vapor+Jet+3.0+Receiver+Gloves | |
10 | 2 | Under Armour Men's Highlight MC Football Clea | 129.99 | http://images.acmesports.sports/Under+Armour+Men%27s+Highlight+MC+Football+Cleat |
%sql SELECT count(1) FROM products
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.
count |
---|
1345 |
Exercise 1¶
Get all the employees who is making more than average salary with in each department.
Use HR database employees and department tables for this problem.
Compute average salary expense for each department and get those employee details who are making more salary than average salary.
Make sure average salary expense per department is rounded off to 2 decimals.
Output should contain employee_id, department_name, salary and avg_salary_expense (derived field).
Data should be sorted in ascending order by department_id and descending order by salary.
employee_id |
department_name |
salary |
avg_salary_expense |
---|---|---|---|
201 |
Marketing |
13000.00 |
9500.00 |
114 |
Purchasing |
11000.00 |
4150.00 |
121 |
Shipping |
8200.00 |
3475.56 |
120 |
Shipping |
8000.00 |
3475.56 |
122 |
Shipping |
7900.00 |
3475.56 |
123 |
Shipping |
6500.00 |
3475.56 |
124 |
Shipping |
5800.00 |
3475.56 |
184 |
Shipping |
4200.00 |
3475.56 |
185 |
Shipping |
4100.00 |
3475.56 |
192 |
Shipping |
4000.00 |
3475.56 |
193 |
Shipping |
3900.00 |
3475.56 |
188 |
Shipping |
3800.00 |
3475.56 |
137 |
Shipping |
3600.00 |
3475.56 |
189 |
Shipping |
3600.00 |
3475.56 |
141 |
Shipping |
3500.00 |
3475.56 |
103 |
IT |
9000.00 |
5760.00 |
104 |
IT |
6000.00 |
5760.00 |
145 |
Sales |
14000.00 |
8955.88 |
146 |
Sales |
13500.00 |
8955.88 |
147 |
Sales |
12000.00 |
8955.88 |
168 |
Sales |
11500.00 |
8955.88 |
148 |
Sales |
11000.00 |
8955.88 |
174 |
Sales |
11000.00 |
8955.88 |
149 |
Sales |
10500.00 |
8955.88 |
162 |
Sales |
10500.00 |
8955.88 |
156 |
Sales |
10000.00 |
8955.88 |
150 |
Sales |
10000.00 |
8955.88 |
169 |
Sales |
10000.00 |
8955.88 |
170 |
Sales |
9600.00 |
8955.88 |
163 |
Sales |
9500.00 |
8955.88 |
151 |
Sales |
9500.00 |
8955.88 |
157 |
Sales |
9500.00 |
8955.88 |
158 |
Sales |
9000.00 |
8955.88 |
152 |
Sales |
9000.00 |
8955.88 |
100 |
Executive |
24000.00 |
19333.33 |
108 |
Finance |
12000.00 |
8600.00 |
109 |
Finance |
9000.00 |
8600.00 |
205 |
Accounting |
12000.00 |
10150.00 |
%load_ext sql
%env DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db
Exercise 2¶
Get cumulative salary with in each department for Finance and IT department along with department name.
Use HR database employees and department tables for this problem.
Compute cumulative salary expense for Finance as well as IT departments with in respective departments.
Make sure cumulative salary expense per department is rounded off to 2 decimals.
Output should contain employee_id, department_name, salary and cum_salary_expense (derived field).
Data should be sorted in ascending order by department_name and then salary.
employee_id |
department_name |
salary |
cum_salary_expense |
---|---|---|---|
113 |
Finance |
6900.00 |
6900.00 |
111 |
Finance |
7700.00 |
14600.00 |
112 |
Finance |
7800.00 |
22400.00 |
110 |
Finance |
8200.00 |
30600.00 |
109 |
Finance |
9000.00 |
39600.00 |
108 |
Finance |
12000.00 |
51600.00 |
107 |
IT |
4200.00 |
4200.00 |
106 |
IT |
4800.00 |
9000.00 |
105 |
IT |
4800.00 |
13800.00 |
104 |
IT |
6000.00 |
19800.00 |
103 |
IT |
9000.00 |
28800.00 |
Exercise 3¶
Get top 3 paid employees with in each department by salary (use dense_rank)
Use HR database employees and department tables for this problem.
Highest paid employee should be ranked first.
Output should contain employee_id, department_id, department_name, salary and employee_rank (derived field).
Data should be sorted in ascending order by department_id in ascending order and then salary in descending order.
employee_id |
department_id |
department_name |
salary |
employee_rank |
---|---|---|---|---|
200 |
10 |
Administration |
4400.00 |
1 |
201 |
20 |
Marketing |
13000.00 |
1 |
202 |
20 |
Marketing |
6000.00 |
2 |
114 |
30 |
Purchasing |
11000.00 |
1 |
115 |
30 |
Purchasing |
3100.00 |
2 |
116 |
30 |
Purchasing |
2900.00 |
3 |
203 |
40 |
Human Resources |
6500.00 |
1 |
121 |
50 |
Shipping |
8200.00 |
1 |
120 |
50 |
Shipping |
8000.00 |
2 |
122 |
50 |
Shipping |
7900.00 |
3 |
103 |
60 |
IT |
9000.00 |
1 |
104 |
60 |
IT |
6000.00 |
2 |
105 |
60 |
IT |
4800.00 |
3 |
106 |
60 |
IT |
4800.00 |
3 |
204 |
70 |
Public Relations |
10000.00 |
1 |
145 |
80 |
Sales |
14000.00 |
1 |
146 |
80 |
Sales |
13500.00 |
2 |
147 |
80 |
Sales |
12000.00 |
3 |
100 |
90 |
Executive |
24000.00 |
1 |
101 |
90 |
Executive |
17000.00 |
2 |
102 |
90 |
Executive |
17000.00 |
2 |
108 |
100 |
Finance |
12000.00 |
1 |
109 |
100 |
Finance |
9000.00 |
2 |
110 |
100 |
Finance |
8200.00 |
3 |
205 |
110 |
Accounting |
12000.00 |
1 |
206 |
110 |
Accounting |
8300.00 |
2 |
Exercise 4¶
Get top 3 products sold in the month of 2014 January by revenue.
Use retail database tables such as orders, order_items and products.
Consider only those orders which are either in COMPLETE or CLOSED status.
Highest revenue generating product should come at top.
Output should contain product_id, product_name, revenue, product_rank. revenue and product_rank are derived fields.
Data should be sorted in descending order by revenue.
product_id |
product_name |
revenue |
product_rank |
---|---|---|---|
1004 |
Field & Stream Sportsman 16 Gun Fire Safe |
250787.46 |
1 |
365 |
Perfect Fitness Perfect Rip Deck |
151474.75 |
2 |
957 |
Diamondback Women’s Serene Classic Comfort Bi |
148190.12 |
3 |
Exercise 5¶
Get top 3 products sold in the month of 2014 January under selected categories by revenue. The categories are Cardio Equipment and Strength Training.
Use retail database tables such as orders, order_items, products as well as categories.
Consider only those orders which are either in COMPLETE or CLOSED status.
Highest revenue generating product should come at top.
Output should contain category_id, category_name, product_id, product_name, revenue, product_rank. revenue and product_rank are derived fields.
Data should be sorted in ascending order by category_id and descending order by revenue.
category_id |
category_name |
product_id |
product_name |
revenue |
product_rank |
---|---|---|---|---|---|
9 |
Cardio Equipment |
191 |
Nike Men’s Free 5.0+ Running Shoe |
132286.77 |
1 |
9 |
Cardio Equipment |
172 |
Nike Women’s Tempo Shorts |
870.00 |
2 |
10 |
Strength Training |
208 |
SOLE E35 Elliptical |
1999.99 |
1 |
10 |
Strength Training |
203 |
GoPro HERO3+ Black Edition Camera |
1199.97 |
2 |
10 |
Strength Training |
216 |
Yakima DoubleDown Ace Hitch Mount 4-Bike Rack |
189.00 |
3 |