Exercises - Analytics Functions

Let us take care of the exercises related to analytics functions. We will be using HR database for the same.

  • Get all the employees who is making more than average salary with in each department.

  • Get cumulative salary for one of the department along with department name.

  • Get top 3 paid employees with in each department by salary (use dense_rank)

  • Get top 3 products sold in the month of 2014 January by revenue.

  • Get top 3 products in each category sold in the month of 2014 January by revenue.

Prepare HR Database

Here are the steps to prepare HR database.

  • Connect to HR DB using psql or SQL Workbench. Here is the sample psql command.

psql -h localhost \
    -p 5432 \
    -d itversity_hr_db \
    -U itversity_hr_user \
    -W
  • Run scripts to create tables and load the data. You can also drop the tables if they already exists.

\i /data/hr_db/drop_tables_pg.sql
\i /data/hr_db/create_tables_pg.sql
\i /data/hr_db/load_data_pg.sql
  • Validate to ensure that data is available in the tables by running these queries.

%load_ext sql
%env DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db
env: DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db
%sql SELECT * FROM employees LIMIT 10
10 rows affected.
employee_id first_name last_name email phone_number hire_date job_id salary commission_pct manager_id department_id
100 Steven King SKING 515.123.4567 1987-06-17 AD_PRES 24000.00 None None 90
101 Neena Kochhar NKOCHHAR 515.123.4568 1989-09-21 AD_VP 17000.00 None 100 90
102 Lex De Haan LDEHAAN 515.123.4569 1993-01-13 AD_VP 17000.00 None 100 90
103 Alexander Hunold AHUNOLD 590.423.4567 1990-01-03 IT_PROG 9000.00 None 102 60
104 Bruce Ernst BERNST 590.423.4568 1991-05-21 IT_PROG 6000.00 None 103 60
105 David Austin DAUSTIN 590.423.4569 1997-06-25 IT_PROG 4800.00 None 103 60
106 Valli Pataballa VPATABAL 590.423.4560 1998-02-05 IT_PROG 4800.00 None 103 60
107 Diana Lorentz DLORENTZ 590.423.5567 1999-02-07 IT_PROG 4200.00 None 103 60
108 Nancy Greenberg NGREENBE 515.124.4569 1994-08-17 FI_MGR 12000.00 None 101 100
109 Daniel Faviet DFAVIET 515.124.4169 1994-08-16 FI_ACCOUNT 9000.00 None 108 100
%%sql 

SELECT * FROM departments 
ORDER BY manager_id NULLS LAST
LIMIT 10
 * postgresql://itversity_hr_user:***@localhost:5432/itversity_hr_db
(psycopg2.errors.UndefinedTable) relation "departments" does not exist
LINE 1: SELECT * FROM departments 
                      ^

[SQL: SELECT * FROM departments 
ORDER BY manager_id NULLS LAST
LIMIT 10]
(Background on this error at: http://sqlalche.me/e/13/f405)

Prepare Retail Database

Make sure to drop and recreate the tables before taking care of the exercises related to retail database.

  • Ensure that we have required database and user for retail data. We might provide the database as part of our labs. Here are the instructions to use psql for setting up the required database (if required) and tables.

psql -U postgres -h localhost -p 5432 -W
CREATE DATABASE itversity_retail_db;
CREATE USER itversity_retail_user WITH ENCRYPTED PASSWORD 'retail_password';
GRANT ALL ON DATABASE itversity_retail_db TO itversity_retail_user;
  • Create Tables using the script provided. You can either use psql or SQL Workbench.

psql -U itversity_retail_user \
  -h localhost \
  -p 5432 \
  -d itversity_retail_db \
  -W
  • You can drop the existing tables.

DROP TABLE IF EXISTS order_items CASCADE;
DROP TABLE IF EXISTS orders CASCADE;
DROP TABLE IF EXISTS customers CASCADE;
DROP TABLE IF EXISTS products CASCADE;
DROP TABLE IF EXISTS categories CASCADE;
DROP TABLE IF EXISTS departments CASCADE;
  • Once the tables are dropped you can run below script to create the tables for the purpose of exercises.

\i /data/retail_db/create_db_tables_pg.sql
  • Data shall be loaded using the script provided.

\i /data/retail_db/load_db_tables_pg.sql
  • Run queries to validate we have data in all the 3 tables.

%load_ext sql
The sql extension is already loaded. To reload it, use:
  %reload_ext sql
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
%sql SELECT * FROM orders LIMIT 10
10 rows affected.
order_id order_date order_customer_id order_status
1 2013-07-25 00:00:00 11599 CLOSED
2 2013-07-25 00:00:00 256 PENDING_PAYMENT
3 2013-07-25 00:00:00 12111 COMPLETE
4 2013-07-25 00:00:00 8827 CLOSED
5 2013-07-25 00:00:00 11318 COMPLETE
6 2013-07-25 00:00:00 7130 COMPLETE
7 2013-07-25 00:00:00 4530 COMPLETE
8 2013-07-25 00:00:00 2911 PROCESSING
9 2013-07-25 00:00:00 5657 PENDING_PAYMENT
10 2013-07-25 00:00:00 5648 PENDING_PAYMENT
%sql SELECT count(1) FROM orders
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.
count
68883
%sql SELECT * FROM order_items LIMIT 10
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.
order_item_id order_item_order_id order_item_product_id order_item_quantity order_item_subtotal order_item_product_price
1 1 957 1 299.98 299.98
2 2 1073 1 199.99 199.99
3 2 502 5 250.0 50.0
4 2 403 1 129.99 129.99
5 4 897 2 49.98 24.99
6 4 365 5 299.95 59.99
7 4 502 3 150.0 50.0
8 4 1014 4 199.92 49.98
9 5 957 1 299.98 299.98
10 5 365 5 299.95 59.99
%sql SELECT count(1) FROM order_items
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.
count
172198
%sql SELECT * FROM products LIMIT 10
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.
product_id product_category_id product_name product_description product_price product_image
1 2 Quest Q64 10 FT. x 10 FT. Slant Leg Instant U 59.98 http://images.acmesports.sports/Quest+Q64+10+FT.+x+10+FT.+Slant+Leg+Instant+Up+Canopy
2 2 Under Armour Men's Highlight MC Football Clea 129.99 http://images.acmesports.sports/Under+Armour+Men%27s+Highlight+MC+Football+Cleat
3 2 Under Armour Men's Renegade D Mid Football Cl 89.99 http://images.acmesports.sports/Under+Armour+Men%27s+Renegade+D+Mid+Football+Cleat
4 2 Under Armour Men's Renegade D Mid Football Cl 89.99 http://images.acmesports.sports/Under+Armour+Men%27s+Renegade+D+Mid+Football+Cleat
5 2 Riddell Youth Revolution Speed Custom Footbal 199.99 http://images.acmesports.sports/Riddell+Youth+Revolution+Speed+Custom+Football+Helmet
6 2 Jordan Men's VI Retro TD Football Cleat 134.99 http://images.acmesports.sports/Jordan+Men%27s+VI+Retro+TD+Football+Cleat
7 2 Schutt Youth Recruit Hybrid Custom Football H 99.99 http://images.acmesports.sports/Schutt+Youth+Recruit+Hybrid+Custom+Football+Helmet+2014
8 2 Nike Men's Vapor Carbon Elite TD Football Cle 129.99 http://images.acmesports.sports/Nike+Men%27s+Vapor+Carbon+Elite+TD+Football+Cleat
9 2 Nike Adult Vapor Jet 3.0 Receiver Gloves 50.0 http://images.acmesports.sports/Nike+Adult+Vapor+Jet+3.0+Receiver+Gloves
10 2 Under Armour Men's Highlight MC Football Clea 129.99 http://images.acmesports.sports/Under+Armour+Men%27s+Highlight+MC+Football+Cleat
%sql SELECT count(1) FROM products
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.
count
1345

Exercise 1

Get all the employees who is making more than average salary with in each department.

  • Use HR database employees and department tables for this problem.

  • Compute average salary expense for each department and get those employee details who are making more salary than average salary.

  • Make sure average salary expense per department is rounded off to 2 decimals.

  • Output should contain employee_id, department_name, salary and avg_salary_expense (derived field).

  • Data should be sorted in ascending order by department_id and descending order by salary.

employee_id

department_name

salary

avg_salary_expense

201

Marketing

13000.00

9500.00

114

Purchasing

11000.00

4150.00

121

Shipping

8200.00

3475.56

120

Shipping

8000.00

3475.56

122

Shipping

7900.00

3475.56

123

Shipping

6500.00

3475.56

124

Shipping

5800.00

3475.56

184

Shipping

4200.00

3475.56

185

Shipping

4100.00

3475.56

192

Shipping

4000.00

3475.56

193

Shipping

3900.00

3475.56

188

Shipping

3800.00

3475.56

137

Shipping

3600.00

3475.56

189

Shipping

3600.00

3475.56

141

Shipping

3500.00

3475.56

103

IT

9000.00

5760.00

104

IT

6000.00

5760.00

145

Sales

14000.00

8955.88

146

Sales

13500.00

8955.88

147

Sales

12000.00

8955.88

168

Sales

11500.00

8955.88

148

Sales

11000.00

8955.88

174

Sales

11000.00

8955.88

149

Sales

10500.00

8955.88

162

Sales

10500.00

8955.88

156

Sales

10000.00

8955.88

150

Sales

10000.00

8955.88

169

Sales

10000.00

8955.88

170

Sales

9600.00

8955.88

163

Sales

9500.00

8955.88

151

Sales

9500.00

8955.88

157

Sales

9500.00

8955.88

158

Sales

9000.00

8955.88

152

Sales

9000.00

8955.88

100

Executive

24000.00

19333.33

108

Finance

12000.00

8600.00

109

Finance

9000.00

8600.00

205

Accounting

12000.00

10150.00

%load_ext sql
%env DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db

Exercise 2

Get cumulative salary with in each department for Finance and IT department along with department name.

  • Use HR database employees and department tables for this problem.

  • Compute cumulative salary expense for Finance as well as IT departments with in respective departments.

  • Make sure cumulative salary expense per department is rounded off to 2 decimals.

  • Output should contain employee_id, department_name, salary and cum_salary_expense (derived field).

  • Data should be sorted in ascending order by department_name and then salary.

employee_id

department_name

salary

cum_salary_expense

113

Finance

6900.00

6900.00

111

Finance

7700.00

14600.00

112

Finance

7800.00

22400.00

110

Finance

8200.00

30600.00

109

Finance

9000.00

39600.00

108

Finance

12000.00

51600.00

107

IT

4200.00

4200.00

106

IT

4800.00

9000.00

105

IT

4800.00

13800.00

104

IT

6000.00

19800.00

103

IT

9000.00

28800.00

Exercise 3

Get top 3 paid employees with in each department by salary (use dense_rank)

  • Use HR database employees and department tables for this problem.

  • Highest paid employee should be ranked first.

  • Output should contain employee_id, department_id, department_name, salary and employee_rank (derived field).

  • Data should be sorted in ascending order by department_id in ascending order and then salary in descending order.

employee_id

department_id

department_name

salary

employee_rank

200

10

Administration

4400.00

1

201

20

Marketing

13000.00

1

202

20

Marketing

6000.00

2

114

30

Purchasing

11000.00

1

115

30

Purchasing

3100.00

2

116

30

Purchasing

2900.00

3

203

40

Human Resources

6500.00

1

121

50

Shipping

8200.00

1

120

50

Shipping

8000.00

2

122

50

Shipping

7900.00

3

103

60

IT

9000.00

1

104

60

IT

6000.00

2

105

60

IT

4800.00

3

106

60

IT

4800.00

3

204

70

Public Relations

10000.00

1

145

80

Sales

14000.00

1

146

80

Sales

13500.00

2

147

80

Sales

12000.00

3

100

90

Executive

24000.00

1

101

90

Executive

17000.00

2

102

90

Executive

17000.00

2

108

100

Finance

12000.00

1

109

100

Finance

9000.00

2

110

100

Finance

8200.00

3

205

110

Accounting

12000.00

1

206

110

Accounting

8300.00

2

Exercise 4

Get top 3 products sold in the month of 2014 January by revenue.

  • Use retail database tables such as orders, order_items and products.

  • Consider only those orders which are either in COMPLETE or CLOSED status.

  • Highest revenue generating product should come at top.

  • Output should contain product_id, product_name, revenue, product_rank. revenue and product_rank are derived fields.

  • Data should be sorted in descending order by revenue.

product_id

product_name

revenue

product_rank

1004

Field & Stream Sportsman 16 Gun Fire Safe

250787.46

1

365

Perfect Fitness Perfect Rip Deck

151474.75

2

957

Diamondback Women’s Serene Classic Comfort Bi

148190.12

3

Exercise 5

Get top 3 products sold in the month of 2014 January under selected categories by revenue. The categories are Cardio Equipment and Strength Training.

  • Use retail database tables such as orders, order_items, products as well as categories.

  • Consider only those orders which are either in COMPLETE or CLOSED status.

  • Highest revenue generating product should come at top.

  • Output should contain category_id, category_name, product_id, product_name, revenue, product_rank. revenue and product_rank are derived fields.

  • Data should be sorted in ascending order by category_id and descending order by revenue.

category_id

category_name

product_id

product_name

revenue

product_rank

9

Cardio Equipment

191

Nike Men’s Free 5.0+ Running Shoe

132286.77

1

9

Cardio Equipment

172

Nike Women’s Tempo Shorts

870.00

2

10

Strength Training

208

SOLE E35 Elliptical

1999.99

1

10

Strength Training

203

GoPro HERO3+ Black Edition Camera

1199.97

2

10

Strength Training

216

Yakima DoubleDown Ace Hitch Mount 4-Bike Rack

189.00

3