Exercises - Mastering Postgresql

Here are the consolidated list of exercises covering all the key topics related to Postgresql to gain mastery over SQL.

  • Ability to use native utilities to load data.

  • Perform CRUD or DML Operations.

  • Writing basic SQL queries.

  • Creating Tables, Indexes and Constraints.

  • Create and Load data into Partitioned Tables.

  • Ability to write advanced SQL queries primarily using Analytic Functions

Getting Started

Here is the exercise related to getting started module.

Exercise - Loading Data

As part of this exercise, you need to take care of loading data using COPY Command.

  • You can connect to the database using following details in the environment provided by us.

    • Host: localhost

    • Port: 5342

    • Database Name: YOUR_OS_USER_hr_db

    • User Name: YOUR_OS_USER_hr_user

    • Password: YOUR_OS_USER_PASSWORD (provided by us).

  • If you are using your own environment, make sure to create database for storing HR Data.

    • Database Name: hr_db

    • User Name: hr_user

    • You can create user with password of your choice.

CREATE DATABASE hr_db;
CREATE USER hr_user WITH PASSWORD 'hr_password';
GRANT ALL ON DATABASE hr_db TO hr_user;
GRANT pg_read_server_files TO hr_user;
  • Create table using this script.

CREATE TABLE employees
   ( employee_id INTEGER
   , first_name VARCHAR(20)
   , last_name VARCHAR(25)
   , email VARCHAR(25)
   , phone_number VARCHAR(20)
   , hire_date DATE
   , job_id VARCHAR(10)
   , salary NUMERIC(8,2)
   , commission_pct NUMERIC(2,2)
   , manager_id INTEGER
   , department_id INTEGER
   ) ;
CREATE UNIQUE INDEX emp_emp_id_pk
         ON employees (employee_id) ;
ALTER TABLE employees ADD
   PRIMARY KEY (employee_id);
  • Understand data.

    • Check for delimiters (record as well as field).

    • Check whether header exists or not.

    • Ensure number of fields for the table and data being loaded are same or not.

  • Load data into the table using COPY Command.

  • Validate by running these queries. You can also use SQL Workbench to run the queries to validate whether data is loaded successfully or not.

SELECT * FROM employees LIMIT 10;
SELECT count(1) FROM employees;

DML or CRUD Operations

Let’s create a table and perform database operations using direct SQL.

Exercise 1 - Create Table

Create table - courses

  • course_id - sequence generated integer and primary key

  • course_name - which holds alpha numeric or string values up to 60 characters

  • course_author - which holds the name of the author up to 40 characters

  • course_status - which holds one of these values (published, draft, inactive).

  • course_published_dt - which holds date type value.

Provide the script as answer for this exercise.

Exercise 2 - Inserting Data

  • Insert data into courses using the data provided. Make sure id is system generated.

Course Name

Course Author

Course Status

Course Published Date

Programming using Python

Bob Dillon

published

2020-09-30

Data Engineering using Python

Bob Dillon

published

2020-07-15

Data Engineering using Scala

Elvis Presley

draft

Programming using Scala

Elvis Presley

published

2020-05-12

Programming using Java

Mike Jack

inactive

2020-08-10

Web Applications - Python Flask

Bob Dillon

inactive

2020-07-20

Web Applications - Java Spring

Mike Jack

draft

Pipeline Orchestration - Python

Bob Dillon

draft

Streaming Pipelines - Python

Bob Dillon

published

2020-10-05

Web Applications - Scala Play

Elvis Presley

inactive

2020-09-30

Web Applications - Python Django

Bob Dillon

published

2020-06-23

Server Automation - Ansible

Uncle Sam

published

2020-07-05

Provide the insert statement(s) as answer for this exercise.

Exercise 3 - Updating Data

Update the status of all the draft courses related to Python and Scala to published along with the course_published_dt using system date.

Provide the update statement as answer for this exercise.

Exercise 4 - Deleting Data

Delete all the courses which are neither in draft mode nor published.

Provide the delete statement as answer for this exercise.

Validation - Get count of all published courses by author and make sure output is sorted in descending order by count.

SELECT course_author, count(1) AS course_count
FROM courses
WHERE course_status= 'published'
GROUP BY course_author

Course Author

Course Count

Bob Dillon

5

Elvis Presley

2

Uncle Sam

1

Basic SQL Queries

Here are some of the exercises for which you can write SQL queries to self evaluate.

  • Ensure that we have required database and user for retail data. We might provide the database as part of our labs. Here are the instructions to use psql for setting up the required database (if required) and tables.

psql -U postgres -h localhost -p 5432 -W
CREATE DATABASE itversity_retail_db;
CREATE USER itversity_retail_user WITH ENCRYPTED PASSWORD 'retail_password';
GRANT ALL ON DATABASE itversity_retail_db TO itversity_retail_user;
  • Create Tables using the script provided. You can either use psql or SQL Workbench.

psql -U itversity_retail_user \
  -h localhost \
  -p 5432 \
  -d itversity_retail_db \
  -W
  • You can drop the existing tables.

DROP TABLE IF EXISTS order_items CASCADE;
DROP TABLE IF EXISTS orders CASCADE;
DROP TABLE IF EXISTS customers CASCADE;
DROP TABLE IF EXISTS products CASCADE;
DROP TABLE IF EXISTS categories CASCADE;
DROP TABLE IF EXISTS departments CASCADE;
  • Once the tables are dropped you can run below script to create the tables for the purpose of exercises.

\i /data/retail_db/create_db_tables_pg.sql
  • Data shall be loaded using the script provided.

\i /data/retail_db/load_db_tables_pg.sql
  • Run queries to validate we have data in all the 3 tables.

Exercise 1 - Customer order count

Get order count per customer for the month of 2014 January.

  • Tables - orders and customers

  • Data should be sorted in descending order by count and ascending order by customer id.

  • Output should contain customer_id, customer_first_name, customer_last_name and customer_order_count.

Exercise 2 - Dormant Customers

Get the customer details who have not placed any order for the month of 2014 January.

  • Tables - orders and customers

  • Data should be sorted in ascending order by customer_id

  • Output should contain all the fields from customers

Exercise 3 - Revenue Per Customer

Get the revenue generated by each customer for the month of 2014 January

  • Tables - orders, order_items and customers

  • Data should be sorted in descending order by revenue and then ascending order by customer_id

  • Output should contain customer_id, customer_first_name, customer_last_name, customer_revenue.

  • If there are no orders placed by customer, then the corresponding revenue for a give customer should be 0.

  • Consider only COMPLETE and CLOSED orders

Exercise 4 - Revenue Per Category

Get the revenue generated for each category for the month of 2014 January

  • Tables - orders, order_items, products and categories

  • Data should be sorted in ascending order by category_id.

  • Output should contain all the fields from category along with the revenue as category_revenue.

  • Consider only COMPLETE and CLOSED orders

Exercise 5 - Product Count Per Department

Get the products for each department.

  • Tables - departments, categories, products

  • Data should be sorted in ascending order by department_id

  • Output should contain all the fields from department and the product count as product_count

Managing Database Objects

This exercise is primarily to assess your capabilities related to put all important DDL concepts in practice by coming up with solution for a typical data migration problem from one database (mysql) to another (postgres).

  • Here are the high level steps for database migration from one type of database to another type of database.

    • Extract DDL Statements from source database (MySQL).

    • Extract the data in the form of delimited files and ship them to target database.

    • Refactor scripts as per target database (Postgres).

    • Create tables in the target database.

    • Execute pre-migration steps (disable constraints, drop indexes etc).

    • Load the data using native utilities.

    • Execute post-migration steps (enable constraints, create or rebuild indexes, reset sequences etc).

    • Sanity checks with basic queries.

    • Make sure all the impacted applications are validated thoroughly.

  • We have scripts and data set available in our GitHub repository. If you are using our environment the repository is already cloned under /data/retail_db.

  • It have scripts to create tables with primary keys. Those scripts are generated from MySQL tables and refactored for Postgres.

    • Script to create tables: create_db_tables_pg.sql

    • Load data into tables: load_db_tables_pg.sql

  • Here are the steps you need to perform to take care of this exercise.

    • Create tables

    • Load data

    • All the tables have surrogate primary keys. Here are the details.

      • orders.order_id

      • order_items.order_item_id

      • customers.customer_id

      • products.product_id

      • categories.category_id

      • departments.department_id

    • Get the maximum value from all surrogate primary key fields.

    • Create sequences for all surrogate primary key fields using maximum value. Make sure to use standard naming conventions for sequences.

    • Ensure sequences are mapped to the surrogate primary key fields.

    • Create foreign key constraints based up on this information.

      • orders.order_customer_id to customers.customer_id

      • order_items.order_item_order_id to orders.order_id

      • order_items.order_item_product_id to products.product_id

      • products.product_category_id to categories.category_id

      • categories.category_department_id to departments.department_id

    • Insert few records in departments to ensure that sequence generated numbers are used for department_id.

  • Here are the commands to launch psql and run scripts to create tables as well as load data into tables.

psql -U itversity_retail_user \
  -h localhost \
  -p 5432 \
  -d itversity_retail_db \
  -W

\i /data/retail_db/create_db_tables_pg.sql

\i /data/retail_db/load_db_tables_pg.sql
  • We use this approach of creating tables, loading data and then adding constraints as well as resetting sequences for large volume data migrations from one database to another database.

  • Here are the commands or queries you need to come up with to solve this problem.

Exercise 1

Queries to get maximum values from surrogate primary keys.

Exercise 2

Commands to add sequences with START WITH pointing to the maximum value for the corresponding surrogate primary key fields. Make sure to use meaningful names to sequences TABLENAME_SURROGATEFIELD_seq (example: users_user_id_seq for users.user_id)

Exercise 3

Commands to alter sequences to bind them to corresponding surrogate primary key fields.

Exercise 4

Add Foreign Key constraints to the tables.

  • Validate if the tables have data violataing foreign key constraints (Hint: You can use left outer join to find rows in child table but not in parent table)

  • Alter tables to add foreign keys as specified.

  • Here are the relationships for your reference.

    • orders.order_customer_id to customers.customer_id

    • order_items.order_item_order_id to orders.order_id

    • order_items.order_item_product_id to products.product_id

  • Solution should contain the following:

    • Commands to add foreign keys to the tables.

Exercise 5

Queries to validate whether constraints are created or not. You can come up with queries against information_schema tables such as columns, sequences etc.

Partitioning Tables

Here is the exercise to get comfort with partitioning. We will be using range partitioning.

  • Use retail database. Make sure orders table already exists.

Exercise 1

Create table orders_part with the same columns as orders.

  • Partition the table by month using range partitioning on order_date.

  • Add 14 partitions - 13 based up on the data and 1 default. Here is the naming convention.

    • Default - orders_part_default

    • Partition for 2014 January - orders_part_201401

Exercise 2

Let us load and validate data in the partitioned table.

  • Load the data from orders into orders_part.

  • Get count on orders_part as well as all the 14 partitions. You should get 0 for default partition and all the records should be distributed using the other 13 partitions.

Pre-Defined Functions

Here are the exercises to ensure our understanding related to Pre-Defined Functions.

  • We will use users table as well as other tables we got as part of retail database.

  • Information will be provided with each exercise.

%load_ext sql
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
%%sql

DROP TABLE IF EXISTS users
%%sql

CREATE TABLE users(
    user_id SERIAL PRIMARY KEY,
    user_first_name VARCHAR(30),
    user_last_name VARCHAR(30),
    user_email_id VARCHAR(50),
    user_gender VARCHAR(1),
    user_unique_id VARCHAR(15),
    user_phone_no VARCHAR(20),
    user_dob DATE,
    created_ts TIMESTAMP
)
%%sql

insert into users (
    user_first_name, user_last_name, user_email_id, user_gender, 
    user_unique_id, user_phone_no, user_dob, created_ts
) VALUES
    ('Giuseppe', 'Bode', 'gbode0@imgur.com', 'M', '88833-8759', 
     '+86 (764) 443-1967', '1973-05-31', '2018-04-15 12:13:38'),
    ('Lexy', 'Gisbey', 'lgisbey1@mail.ru', 'F', '262501-029', 
     '+86 (751) 160-3742', '2003-05-31', '2020-12-29 06:44:09'),
    ('Karel', 'Claringbold', 'kclaringbold2@yale.edu', 'F', '391-33-2823', 
     '+62 (445) 471-2682', '1985-11-28', '2018-11-19 00:04:08'),
    ('Marv', 'Tanswill', 'mtanswill3@dedecms.com', 'F', '1195413-80', 
     '+62 (497) 736-6802', '1998-05-24', '2018-11-19 16:29:43'),
    ('Gertie', 'Espinoza', 'gespinoza4@nationalgeographic.com', 'M', '471-24-6869', 
     '+249 (687) 506-2960', '1997-10-30', '2020-01-25 21:31:10'),
    ('Saleem', 'Danneil', 'sdanneil5@guardian.co.uk', 'F', '192374-933', 
     '+63 (810) 321-0331', '1992-03-08', '2020-11-07 19:01:14'),
    ('Rickert', 'O''Shiels', 'roshiels6@wikispaces.com', 'M', '749-27-47-52', 
     '+86 (184) 759-3933', '1972-11-01', '2018-03-20 10:53:24'),
    ('Cybil', 'Lissimore', 'clissimore7@pinterest.com', 'M', '461-75-4198', 
     '+54 (613) 939-6976', '1978-03-03', '2019-12-09 14:08:30'),
    ('Melita', 'Rimington', 'mrimington8@mozilla.org', 'F', '892-36-676-2', 
     '+48 (322) 829-8638', '1995-12-15', '2018-04-03 04:21:33'),
    ('Benetta', 'Nana', 'bnana9@google.com', 'M', '197-54-1646', 
     '+420 (934) 611-0020', '1971-12-07', '2018-10-17 21:02:51'),
    ('Gregorius', 'Gullane', 'ggullanea@prnewswire.com', 'F', '232-55-52-58', 
     '+62 (780) 859-1578', '1973-09-18', '2020-01-14 23:38:53'),
    ('Una', 'Glayzer', 'uglayzerb@pinterest.com', 'M', '898-84-336-6', 
     '+380 (840) 437-3981', '1983-05-26', '2019-09-17 03:24:21'),
    ('Jamie', 'Vosper', 'jvosperc@umich.edu', 'M', '247-95-68-44', 
     '+81 (205) 723-1942', '1972-03-18', '2020-07-23 16:39:33'),
    ('Calley', 'Tilson', 'ctilsond@issuu.com', 'F', '415-48-894-3', 
     '+229 (698) 777-4904', '1987-06-12', '2020-06-05 12:10:50'),
    ('Peadar', 'Gregorowicz', 'pgregorowicze@omniture.com', 'M', '403-39-5-869', 
     '+7 (267) 853-3262', '1996-09-21', '2018-05-29 23:51:31'),
    ('Jeanie', 'Webling', 'jweblingf@booking.com', 'F', '399-83-05-03', 
     '+351 (684) 413-0550', '1994-12-27', '2018-02-09 01:31:11'),
    ('Yankee', 'Jelf', 'yjelfg@wufoo.com', 'F', '607-99-0411', 
     '+1 (864) 112-7432', '1988-11-13', '2019-09-16 16:09:12'),
    ('Blair', 'Aumerle', 'baumerleh@toplist.cz', 'F', '430-01-578-5', 
     '+7 (393) 232-1860', '1979-11-09', '2018-10-28 19:25:35'),
    ('Pavlov', 'Steljes', 'psteljesi@macromedia.com', 'F', '571-09-6181', 
     '+598 (877) 881-3236', '1991-06-24', '2020-09-18 05:34:31'),
    ('Darn', 'Hadeke', 'dhadekej@last.fm', 'M', '478-32-02-87', 
     '+370 (347) 110-4270', '1984-09-04', '2018-02-10 12:56:00'),
    ('Wendell', 'Spanton', 'wspantonk@de.vu', 'F', null, 
     '+84 (301) 762-1316', '1973-07-24', '2018-01-30 01:20:11'),
    ('Carlo', 'Yearby', 'cyearbyl@comcast.net', 'F', null, 
     '+55 (288) 623-4067', '1974-11-11', '2018-06-24 03:18:40'),
    ('Sheila', 'Evitts', 'sevittsm@webmd.com', null, '830-40-5287',
     null, '1977-03-01', '2020-07-20 09:59:41'),
    ('Sianna', 'Lowdham', 'slowdhamn@stanford.edu', null, '778-0845', 
     null, '1985-12-23', '2018-06-29 02:42:49'),
    ('Phylys', 'Aslie', 'paslieo@qq.com', 'M', '368-44-4478', 
     '+86 (765) 152-8654', '1984-03-22', '2019-10-01 01:34:28')

Exercise 1

Get all the number of users created per year.

  • Use users table for this exercise.

  • Output should contain 4 digit year and count.

  • Use date specific functions to get the year using created_ts.

  • Make sure you define aliases to the columns as created_year and user_count respectively.

  • Data should be sorted in ascending order by created_year.

  • When you run the query using Jupyter environment, it might have decimals for integers. Hence you can display results even with decimal points.

  • Here is the sample output.

created_year

user_count

2018

13

2019

4

2020

8

Exercise 2

Get the day name of the birth days for all the users born in the month of June.

  • Use users table for this exercise.

  • Output should contain user_id, user_dob, user_email_id and user_day_of_birth.

  • Use date specific functions to get the month using user_dob.

  • user_day_of_birth should be full day with first character in upper case such as Tuesday

  • Data should be sorted by day with in the month of May.

user_id

user_dob

user_email_id

user_day_of_birth

4

1998-05-24

mtanswill3@dedecms.com

Sunday

12

1983-05-26

uglayzerb@pinterest.com

Thursday

1

1973-05-31

gbode0@imgur.com

Thursday

2

2003-05-31

lgisbey1@mail.ru

Saturday

Exercise 3

Get the names and email ids of users added in year 2019.

  • Use users table for this exercise.

  • Output should contain user_id, user_name, user_email_id, created_ts, created_year.

  • Use date specific functions to get the year using created_ts.

  • user_name is a derived column by concatenating user_first_name and user_last_name with space in between.

  • user_name should have values in upper case.

  • Data should be sorted in ascending order by user_name

user_id

user_name

user_email_id

created_ts

created_year

8

CYBIL LISSIMORE

clissimore7@pinterest.com

2019-12-09 14:08:30

2019.0

25

PHYLYS ASLIE

paslieo@qq.com

2019-10-01 01:34:28

2019.0

12

UNA GLAYZER

uglayzerb@pinterest.com

2019-09-17 03:24:21

2019.0

17

YANKEE JELF

yjelfg@wufoo.com

2019-09-16 16:09:12

2019.0

Exercise 4

Get the number of users by gender.

  • Use users table for this exercise.

  • Output should contain gender and user_count.

  • For males the output should display Male and for females the output should display Female.

  • If gender is not specified, then it should display Not Specified.

  • Data should be sorted in descending order by user_count.

user_gender

user_count

Female

13

Male

10

Not Specified

2

Exercise 5

Get last 4 digits of unique ids.

  • Use users table for this exercise.

  • Output should contain user_id, user_unique_id and user_unique_id_last4

  • Unique ids are either null or not null.

  • Unique ids contain numbers and hyphens and are of different length.

  • We need to get last 4 digits discarding hyphens only when the number of digits are at least 9.

  • If unique id is null, then you should dispaly Not Specified.

  • After discarding hyphens, if unique id have less than 9 digits then you should display Invalid Unique Id.

  • Data should be sorted by user_id. You might see None or null for those user ids where there is no unique id for user_unique_id

user_id

user_unique_id

user_unique_id_last4

1

88833-8759

8759

2

262501-029

1029

3

391-33-2823

2823

4

1195413-80

1380

5

471-24-6869

6869

6

192374-933

4933

7

749-27-47-52

4752

8

461-75-4198

4198

9

892-36-676-2

6762

10

197-54-1646

1646

11

232-55-52-58

5258

12

898-84-336-6

3366

13

247-95-68-44

6844

14

415-48-894-3

8943

15

403-39-5-869

5869

16

399-83-05-03

0503

17

607-99-0411

0411

18

430-01-578-5

5785

19

571-09-6181

6181

20

478-32-02-87

0287

21

Not Specified

22

Not Specified

23

830-40-5287

5287

24

778-0845

Invalid Unique Id

25

368-44-4478

4478

Exercise 6

Get the count of users based up on country code.

  • Use users table for this exercise.

  • Output should contain country code and count.

  • There should be no + in the country code. It should only contain digits.

  • Data should be sorted as numbers by country code.

  • We should discard user_phone_no with null values.

  • Here is the desired output:

country_code

user_count

1

1

7

2

48

1

54

1

55

1

62

3

63

1

81

1

84

1

86

4

229

1

249

1

351

1

370

1

380

1

420

1

598

1

Exercise 7

Let us validate if we have invalid order_item_subtotal as part of order_items table.

  • order_items table have 6 fields.

    • order_item_id

    • order_item_order_id

    • order_item_product_id

    • order_item_quantity

    • order_item_subtotal

    • order_item_product_price

  • order_item_subtotal is nothing but product of order_item_quantity and order_item_product_price. It means order_item_subtotal is compute by multiplying order_item_quantity and order_item_product_price for each item.

  • You need to get the count of order_items where order_item_subtotal is not equal to the product of order_item_quantity and order_item_product_price.

  • There can be issues related to rounding off. Make sure it is taken care using appropriate function.

  • Output should be 0 as there are no such records.

count

0

Exercise 8

Get number of orders placed on weekdays and weekends in the month of January 2014.

  • orders have 4 fields

    • order_id

    • order_date

    • order_customer_id

    • order_status

  • Use order date to determine the day on which orders are placed.

  • Output should contain 2 columns - day_type and order_count.

  • day_type should have 2 values Week days and Weekend days.

  • Here is the desired output.

day_type

order_count

Weekend days

1505

Week days

4403

Analytics Functions

Let us take care of the exercises related to analytics functions. We will be using HR database for the same.

  • Get all the employees who is making more than average salary with in each department.

  • Get cumulative salary for one of the department along with department name.

  • Get top 3 paid employees with in each department by salary (use dense_rank)

  • Get top 3 products sold in the month of 2014 January by revenue.

  • Get top 3 products in each category sold in the month of 2014 January by revenue.

Prepare HR Database

Here are the steps to prepare HR database.

  • Connect to HR DB using psql or SQL Workbench. Here is the sample psql command.

psql -h localhost \
    -p 5432 \
    -d itversity_hr_db \
    -U itversity_hr_user \
    -W
  • Run scripts to create tables and load the data. You can also drop the tables if they already exists.

\i /data/hr_db/drop_tables_pg.sql
\i /data/hr_db/create_tables_pg.sql
\i /data/hr_db/load_data_pg.sql
  • Validate to ensure that data is available in the tables by running these queries.

%load_ext sql
%env DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db
env: DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db
%sql SELECT * FROM employees LIMIT 10
 * postgresql://itversity_hr_user:***@localhost:5432/itversity_hr_db
10 rows affected.
employee_id first_name last_name email phone_number hire_date job_id salary commission_pct manager_id department_id
100 Steven King SKING 515.123.4567 1987-06-17 AD_PRES 24000.00 None None 90
101 Neena Kochhar NKOCHHAR 515.123.4568 1989-09-21 AD_VP 17000.00 None 100 90
102 Lex De Haan LDEHAAN 515.123.4569 1993-01-13 AD_VP 17000.00 None 100 90
103 Alexander Hunold AHUNOLD 590.423.4567 1990-01-03 IT_PROG 9000.00 None 102 60
104 Bruce Ernst BERNST 590.423.4568 1991-05-21 IT_PROG 6000.00 None 103 60
105 David Austin DAUSTIN 590.423.4569 1997-06-25 IT_PROG 4800.00 None 103 60
106 Valli Pataballa VPATABAL 590.423.4560 1998-02-05 IT_PROG 4800.00 None 103 60
107 Diana Lorentz DLORENTZ 590.423.5567 1999-02-07 IT_PROG 4200.00 None 103 60
108 Nancy Greenberg NGREENBE 515.124.4569 1994-08-17 FI_MGR 12000.00 None 101 100
109 Daniel Faviet DFAVIET 515.124.4169 1994-08-16 FI_ACCOUNT 9000.00 None 108 100
%%sql 

SELECT * FROM departments 
ORDER BY manager_id NULLS LAST
LIMIT 10
 * postgresql://itversity_hr_user:***@localhost:5432/itversity_hr_db
10 rows affected.
department_id department_name manager_id location_id
90 Executive 100 1700
60 IT 103 1400
100 Finance 108 1700
30 Purchasing 114 1700
50 Shipping 121 1500
80 Sales 145 2500
10 Administration 200 1700
20 Marketing 201 1800
40 Human Resources 203 2400
70 Public Relations 204 2700

Prepare Retail Database

Make sure to drop and recreate the tables before taking care of the exercises related to retail database.

  • Ensure that we have required database and user for retail data. We might provide the database as part of our labs. Here are the instructions to use psql for setting up the required database (if required) and tables.

psql -U postgres -h localhost -p 5432 -W
CREATE DATABASE itversity_retail_db;
CREATE USER itversity_retail_user WITH ENCRYPTED PASSWORD 'retail_password';
GRANT ALL ON DATABASE itversity_retail_db TO itversity_retail_user;
  • Create Tables using the script provided. You can either use psql or SQL Workbench.

psql -U itversity_retail_user \
  -h localhost \
  -p 5432 \
  -d itversity_retail_db \
  -W
  • You can drop the existing tables.

DROP TABLE IF EXISTS order_items CASCADE;
DROP TABLE IF EXISTS orders CASCADE;
DROP TABLE IF EXISTS customers CASCADE;
DROP TABLE IF EXISTS products CASCADE;
DROP TABLE IF EXISTS categories CASCADE;
DROP TABLE IF EXISTS departments CASCADE;
  • Once the tables are dropped you can run below script to create the tables for the purpose of exercises.

\i /data/retail_db/create_db_tables_pg.sql
  • Data shall be loaded using the script provided.

\i /data/retail_db/load_db_tables_pg.sql
  • Run queries to validate we have data in all the 3 tables.

%load_ext sql
The sql extension is already loaded. To reload it, use:
  %reload_ext sql
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
%sql SELECT * FROM orders LIMIT 10
10 rows affected.
order_id order_date order_customer_id order_status
1 2013-07-25 00:00:00 11599 CLOSED
2 2013-07-25 00:00:00 256 PENDING_PAYMENT
3 2013-07-25 00:00:00 12111 COMPLETE
4 2013-07-25 00:00:00 8827 CLOSED
5 2013-07-25 00:00:00 11318 COMPLETE
6 2013-07-25 00:00:00 7130 COMPLETE
7 2013-07-25 00:00:00 4530 COMPLETE
8 2013-07-25 00:00:00 2911 PROCESSING
9 2013-07-25 00:00:00 5657 PENDING_PAYMENT
10 2013-07-25 00:00:00 5648 PENDING_PAYMENT
%sql SELECT count(1) FROM orders
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.
count
68883
%sql SELECT * FROM order_items LIMIT 10
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.
order_item_id order_item_order_id order_item_product_id order_item_quantity order_item_subtotal order_item_product_price
1 1 957 1 299.98 299.98
2 2 1073 1 199.99 199.99
3 2 502 5 250.0 50.0
4 2 403 1 129.99 129.99
5 4 897 2 49.98 24.99
6 4 365 5 299.95 59.99
7 4 502 3 150.0 50.0
8 4 1014 4 199.92 49.98
9 5 957 1 299.98 299.98
10 5 365 5 299.95 59.99
%sql SELECT count(1) FROM order_items
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.
count
172198
%sql SELECT * FROM products LIMIT 10
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.
product_id product_category_id product_name product_description product_price product_image
1 2 Quest Q64 10 FT. x 10 FT. Slant Leg Instant U 59.98 http://images.acmesports.sports/Quest+Q64+10+FT.+x+10+FT.+Slant+Leg+Instant+Up+Canopy
2 2 Under Armour Men's Highlight MC Football Clea 129.99 http://images.acmesports.sports/Under+Armour+Men%27s+Highlight+MC+Football+Cleat
3 2 Under Armour Men's Renegade D Mid Football Cl 89.99 http://images.acmesports.sports/Under+Armour+Men%27s+Renegade+D+Mid+Football+Cleat
4 2 Under Armour Men's Renegade D Mid Football Cl 89.99 http://images.acmesports.sports/Under+Armour+Men%27s+Renegade+D+Mid+Football+Cleat
5 2 Riddell Youth Revolution Speed Custom Footbal 199.99 http://images.acmesports.sports/Riddell+Youth+Revolution+Speed+Custom+Football+Helmet
6 2 Jordan Men's VI Retro TD Football Cleat 134.99 http://images.acmesports.sports/Jordan+Men%27s+VI+Retro+TD+Football+Cleat
7 2 Schutt Youth Recruit Hybrid Custom Football H 99.99 http://images.acmesports.sports/Schutt+Youth+Recruit+Hybrid+Custom+Football+Helmet+2014
8 2 Nike Men's Vapor Carbon Elite TD Football Cle 129.99 http://images.acmesports.sports/Nike+Men%27s+Vapor+Carbon+Elite+TD+Football+Cleat
9 2 Nike Adult Vapor Jet 3.0 Receiver Gloves 50.0 http://images.acmesports.sports/Nike+Adult+Vapor+Jet+3.0+Receiver+Gloves
10 2 Under Armour Men's Highlight MC Football Clea 129.99 http://images.acmesports.sports/Under+Armour+Men%27s+Highlight+MC+Football+Cleat
%sql SELECT count(1) FROM products
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.
count
1345

Exercise 1

Get all the employees who is making more than average salary with in each department.

  • Use HR database employees and department tables for this problem.

  • Compute average salary expense for each department and get those employee details who are making more salary than average salary.

  • Make sure average salary expense per department is rounded off to 2 decimals.

  • Output should contain employee_id, department_name, salary and avg_salary_expense (derived field).

  • Data should be sorted in ascending order by department_id and descending order by salary.

employee_id

department_name

salary

avg_salary_expense

201

Marketing

13000.00

9500.00

114

Purchasing

11000.00

4150.00

121

Shipping

8200.00

3475.56

120

Shipping

8000.00

3475.56

122

Shipping

7900.00

3475.56

123

Shipping

6500.00

3475.56

124

Shipping

5800.00

3475.56

184

Shipping

4200.00

3475.56

185

Shipping

4100.00

3475.56

192

Shipping

4000.00

3475.56

193

Shipping

3900.00

3475.56

188

Shipping

3800.00

3475.56

137

Shipping

3600.00

3475.56

189

Shipping

3600.00

3475.56

141

Shipping

3500.00

3475.56

103

IT

9000.00

5760.00

104

IT

6000.00

5760.00

145

Sales

14000.00

8955.88

146

Sales

13500.00

8955.88

147

Sales

12000.00

8955.88

168

Sales

11500.00

8955.88

148

Sales

11000.00

8955.88

174

Sales

11000.00

8955.88

149

Sales

10500.00

8955.88

162

Sales

10500.00

8955.88

156

Sales

10000.00

8955.88

150

Sales

10000.00

8955.88

169

Sales

10000.00

8955.88

170

Sales

9600.00

8955.88

163

Sales

9500.00

8955.88

151

Sales

9500.00

8955.88

157

Sales

9500.00

8955.88

158

Sales

9000.00

8955.88

152

Sales

9000.00

8955.88

100

Executive

24000.00

19333.33

108

Finance

12000.00

8600.00

109

Finance

9000.00

8600.00

205

Accounting

12000.00

10150.00

%load_ext sql
%env DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db
env: DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db

Exercise 2

Get cumulative salary with in each department for Finance and IT department along with department name.

  • Use HR database employees and department tables for this problem.

  • Compute cumulative salary expense for Finance as well as IT departments with in respective departments.

  • Make sure cumulative salary expense per department is rounded off to 2 decimals.

  • Output should contain employee_id, department_name, salary and cum_salary_expense (derived field).

  • Data should be sorted in ascending order by department_name and then salary.

employee_id

department_name

salary

cum_salary_expense

113

Finance

6900.00

6900.00

111

Finance

7700.00

14600.00

112

Finance

7800.00

22400.00

110

Finance

8200.00

30600.00

109

Finance

9000.00

39600.00

108

Finance

12000.00

51600.00

107

IT

4200.00

4200.00

106

IT

4800.00

9000.00

105

IT

4800.00

13800.00

104

IT

6000.00

19800.00

103

IT

9000.00

28800.00

Exercise 3

Get top 3 paid employees with in each department by salary (use dense_rank)

  • Use HR database employees and department tables for this problem.

  • Highest paid employee should be ranked first.

  • Output should contain employee_id, department_id, department_name, salary and employee_rank (derived field).

  • Data should be sorted in ascending order by department_id in ascending order and then salary in descending order.

employee_id

department_id

department_name

salary

employee_rank

200

10

Administration

4400.00

1

201

20

Marketing

13000.00

1

202

20

Marketing

6000.00

2

114

30

Purchasing

11000.00

1

115

30

Purchasing

3100.00

2

116

30

Purchasing

2900.00

3

203

40

Human Resources

6500.00

1

121

50

Shipping

8200.00

1

120

50

Shipping

8000.00

2

122

50

Shipping

7900.00

3

103

60

IT

9000.00

1

104

60

IT

6000.00

2

105

60

IT

4800.00

3

106

60

IT

4800.00

3

204

70

Public Relations

10000.00

1

145

80

Sales

14000.00

1

146

80

Sales

13500.00

2

147

80

Sales

12000.00

3

100

90

Executive

24000.00

1

101

90

Executive

17000.00

2

102

90

Executive

17000.00

2

108

100

Finance

12000.00

1

109

100

Finance

9000.00

2

110

100

Finance

8200.00

3

205

110

Accounting

12000.00

1

206

110

Accounting

8300.00

2

Exercise 4

Get top 3 products sold in the month of 2014 January by revenue.

  • Use retail database tables such as orders, order_items and products.

  • Highest revenue generating product should come at top.

  • Consider only those orders which are either in COMPLETE or CLOSED status.

  • Output should contain product_id, product_name, revenue, product_rank. revenue and product_rank are derived fields.

  • Data should be sorted in descending order by revenue.

product_id

product_name

revenue

product_rank

1004

Field & Stream Sportsman 16 Gun Fire Safe

250787.46

1

365

Perfect Fitness Perfect Rip Deck

151474.75

2

957

Diamondback Women’s Serene Classic Comfort Bi

148190.12

3

Exercise 5

Get top 3 products sold in the month of 2014 January under selected categories by revenue. The categories are Cardio Equipment and Strength Training.

  • Use retail database tables such as orders, order_items, products as well as categories.

  • Highest revenue generating product should come at top.

  • Consider only those orders which are either in COMPLETE or CLOSED status.

  • Output should contain category_id, category_name, product_id, product_name, revenue, product_rank. revenue and product_rank are derived fields.

  • Data should be sorted in ascending order by category_id and descending order by revenue.

category_id

category_name

product_id

product_name

revenue

product_rank

9

Cardio Equipment

191

Nike Men’s Free 5.0+ Running Shoe

132286.77

1

9

Cardio Equipment

172

Nike Women’s Tempo Shorts

870.00

2

10

Strength Training

208

SOLE E35 Elliptical

1999.99

1

10

Strength Training

203

GoPro HERO3+ Black Edition Camera

1199.97

2

10

Strength Training

216

Yakima DoubleDown Ace Hitch Mount 4-Bike Rack

189.00

3