Exercises - Mastering Postgresql¶
Here are the consolidated list of exercises covering all the key topics related to Postgresql to gain mastery over SQL.
Ability to use native utilities to load data.
Perform CRUD or DML Operations.
Writing basic SQL queries.
Creating Tables, Indexes and Constraints.
Create and Load data into Partitioned Tables.
Ability to write advanced SQL queries primarily using Analytic Functions
Getting Started¶
Here is the exercise related to getting started module.
Exercise - Loading Data¶
As part of this exercise, you need to take care of loading data using COPY
Command.
You can connect to the database using following details in the environment provided by us.
Host: localhost
Port: 5342
Database Name: YOUR_OS_USER_hr_db
User Name: YOUR_OS_USER_hr_user
Password: YOUR_OS_USER_PASSWORD (provided by us).
If you are using your own environment, make sure to create database for storing HR Data.
Database Name: hr_db
User Name: hr_user
You can create user with password of your choice.
CREATE DATABASE hr_db;
CREATE USER hr_user WITH PASSWORD 'hr_password';
GRANT ALL ON DATABASE hr_db TO hr_user;
GRANT pg_read_server_files TO hr_user;
Create table using this script.
CREATE TABLE employees
( employee_id INTEGER
, first_name VARCHAR(20)
, last_name VARCHAR(25)
, email VARCHAR(25)
, phone_number VARCHAR(20)
, hire_date DATE
, job_id VARCHAR(10)
, salary NUMERIC(8,2)
, commission_pct NUMERIC(2,2)
, manager_id INTEGER
, department_id INTEGER
) ;
CREATE UNIQUE INDEX emp_emp_id_pk
ON employees (employee_id) ;
ALTER TABLE employees ADD
PRIMARY KEY (employee_id);
Understand data.
Check for delimiters (record as well as field).
Check whether header exists or not.
Ensure number of fields for the table and data being loaded are same or not.
Load data into the table using
COPY
Command.Validate by running these queries. You can also use SQL Workbench to run the queries to validate whether data is loaded successfully or not.
SELECT * FROM employees LIMIT 10;
SELECT count(1) FROM employees;
DML or CRUD Operations¶
Let’s create a table and perform database operations using direct SQL.
Exercise 1 - Create Table¶
Create table - courses
course_id - sequence generated integer and primary key
course_name - which holds alpha numeric or string values up to 60 characters
course_author - which holds the name of the author up to 40 characters
course_status - which holds one of these values (published, draft, inactive).
course_published_dt - which holds date type value.
Provide the script as answer for this exercise.
Exercise 2 - Inserting Data¶
Insert data into courses using the data provided. Make sure id is system generated.
Course Name |
Course Author |
Course Status |
Course Published Date |
---|---|---|---|
Programming using Python |
Bob Dillon |
published |
2020-09-30 |
Data Engineering using Python |
Bob Dillon |
published |
2020-07-15 |
Data Engineering using Scala |
Elvis Presley |
draft |
|
Programming using Scala |
Elvis Presley |
published |
2020-05-12 |
Programming using Java |
Mike Jack |
inactive |
2020-08-10 |
Web Applications - Python Flask |
Bob Dillon |
inactive |
2020-07-20 |
Web Applications - Java Spring |
Mike Jack |
draft |
|
Pipeline Orchestration - Python |
Bob Dillon |
draft |
|
Streaming Pipelines - Python |
Bob Dillon |
published |
2020-10-05 |
Web Applications - Scala Play |
Elvis Presley |
inactive |
2020-09-30 |
Web Applications - Python Django |
Bob Dillon |
published |
2020-06-23 |
Server Automation - Ansible |
Uncle Sam |
published |
2020-07-05 |
Provide the insert statement(s) as answer for this exercise.
Exercise 3 - Updating Data¶
Update the status of all the draft courses related to Python and Scala to published along with the course_published_dt using system date.
Provide the update statement as answer for this exercise.
Exercise 4 - Deleting Data¶
Delete all the courses which are neither in draft mode nor published.
Provide the delete statement as answer for this exercise.
Validation - Get count of all published courses by author and make sure output is sorted in descending order by count.
SELECT course_author, count(1) AS course_count
FROM courses
WHERE course_status= 'published'
GROUP BY course_author
Course Author |
Course Count |
---|---|
Bob Dillon |
5 |
Elvis Presley |
2 |
Uncle Sam |
1 |
Basic SQL Queries¶
Here are some of the exercises for which you can write SQL queries to self evaluate.
Ensure that we have required database and user for retail data. We might provide the database as part of our labs. Here are the instructions to use
psql
for setting up the required database (if required) and tables.
psql -U postgres -h localhost -p 5432 -W
CREATE DATABASE itversity_retail_db;
CREATE USER itversity_retail_user WITH ENCRYPTED PASSWORD 'retail_password';
GRANT ALL ON DATABASE itversity_retail_db TO itversity_retail_user;
Create Tables using the script provided. You can either use
psql
or SQL Workbench.
psql -U itversity_retail_user \
-h localhost \
-p 5432 \
-d itversity_retail_db \
-W
You can drop the existing tables.
DROP TABLE IF EXISTS order_items CASCADE;
DROP TABLE IF EXISTS orders CASCADE;
DROP TABLE IF EXISTS customers CASCADE;
DROP TABLE IF EXISTS products CASCADE;
DROP TABLE IF EXISTS categories CASCADE;
DROP TABLE IF EXISTS departments CASCADE;
Once the tables are dropped you can run below script to create the tables for the purpose of exercises.
\i /data/retail_db/create_db_tables_pg.sql
Data shall be loaded using the script provided.
\i /data/retail_db/load_db_tables_pg.sql
Run queries to validate we have data in all the 3 tables.
Exercise 1 - Customer order count¶
Get order count per customer for the month of 2014 January.
Tables - orders and customers
Data should be sorted in descending order by count and ascending order by customer id.
Output should contain customer_id, customer_first_name, customer_last_name and customer_order_count.
Exercise 2 - Dormant Customers¶
Get the customer details who have not placed any order for the month of 2014 January.
Tables - orders and customers
Data should be sorted in ascending order by customer_id
Output should contain all the fields from customers
Exercise 3 - Revenue Per Customer¶
Get the revenue generated by each customer for the month of 2014 January
Tables - orders, order_items and customers
Data should be sorted in descending order by revenue and then ascending order by customer_id
Output should contain customer_id, customer_first_name, customer_last_name, customer_revenue.
If there are no orders placed by customer, then the corresponding revenue for a give customer should be 0.
Consider only COMPLETE and CLOSED orders
Exercise 4 - Revenue Per Category¶
Get the revenue generated for each category for the month of 2014 January
Tables - orders, order_items, products and categories
Data should be sorted in ascending order by category_id.
Output should contain all the fields from category along with the revenue as category_revenue.
Consider only COMPLETE and CLOSED orders
Exercise 5 - Product Count Per Department¶
Get the products for each department.
Tables - departments, categories, products
Data should be sorted in ascending order by department_id
Output should contain all the fields from department and the product count as product_count
Managing Database Objects¶
This exercise is primarily to assess your capabilities related to put all important DDL concepts in practice by coming up with solution for a typical data migration problem from one database (mysql) to another (postgres).
Here are the high level steps for database migration from one type of database to another type of database.
Extract DDL Statements from source database (MySQL).
Extract the data in the form of delimited files and ship them to target database.
Refactor scripts as per target database (Postgres).
Create tables in the target database.
Execute pre-migration steps (disable constraints, drop indexes etc).
Load the data using native utilities.
Execute post-migration steps (enable constraints, create or rebuild indexes, reset sequences etc).
Sanity checks with basic queries.
Make sure all the impacted applications are validated thoroughly.
We have scripts and data set available in our GitHub repository. If you are using our environment the repository is already cloned under /data/retail_db.
It have scripts to create tables with primary keys. Those scripts are generated from MySQL tables and refactored for Postgres.
Script to create tables: create_db_tables_pg.sql
Load data into tables: load_db_tables_pg.sql
Here are the steps you need to perform to take care of this exercise.
Create tables
Load data
All the tables have surrogate primary keys. Here are the details.
orders.order_id
order_items.order_item_id
customers.customer_id
products.product_id
categories.category_id
departments.department_id
Get the maximum value from all surrogate primary key fields.
Create sequences for all surrogate primary key fields using maximum value. Make sure to use standard naming conventions for sequences.
Ensure sequences are mapped to the surrogate primary key fields.
Create foreign key constraints based up on this information.
orders.order_customer_id to customers.customer_id
order_items.order_item_order_id to orders.order_id
order_items.order_item_product_id to products.product_id
products.product_category_id to categories.category_id
categories.category_department_id to departments.department_id
Insert few records in
departments
to ensure that sequence generated numbers are used fordepartment_id
.
Here are the commands to launch
psql
and run scripts to create tables as well as load data into tables.
psql -U itversity_retail_user \
-h localhost \
-p 5432 \
-d itversity_retail_db \
-W
\i /data/retail_db/create_db_tables_pg.sql
\i /data/retail_db/load_db_tables_pg.sql
We use this approach of creating tables, loading data and then adding constraints as well as resetting sequences for large volume data migrations from one database to another database.
Here are the commands or queries you need to come up with to solve this problem.
Exercise 1¶
Queries to get maximum values from surrogate primary keys.
Exercise 2¶
Commands to add sequences with START WITH
pointing to the maximum value for the corresponding surrogate primary key fields. Make sure to use meaningful names to sequences TABLENAME_SURROGATEFIELD_seq (example: users_user_id_seq for users.user_id)
Exercise 3¶
Commands to alter sequences to bind them to corresponding surrogate primary key fields.
Exercise 4¶
Add Foreign Key constraints to the tables.
Validate if the tables have data violataing foreign key constraints (Hint: You can use left outer join to find rows in child table but not in parent table)
Alter tables to add foreign keys as specified.
Here are the relationships for your reference.
orders.order_customer_id to customers.customer_id
order_items.order_item_order_id to orders.order_id
order_items.order_item_product_id to products.product_id
Solution should contain the following:
Commands to add foreign keys to the tables.
Exercise 5¶
Queries to validate whether constraints are created or not. You can come up with queries against information_schema
tables such as columns
, sequences
etc.
Partitioning Tables¶
Here is the exercise to get comfort with partitioning. We will be using range partitioning.
Use retail database. Make sure orders table already exists.
Exercise 1¶
Create table orders_part with the same columns as orders.
Partition the table by month using range partitioning on order_date.
Add 14 partitions - 13 based up on the data and 1 default. Here is the naming convention.
Default - orders_part_default
Partition for 2014 January - orders_part_201401
Exercise 2¶
Let us load and validate data in the partitioned table.
Load the data from orders into orders_part.
Get count on orders_part as well as all the 14 partitions. You should get 0 for default partition and all the records should be distributed using the other 13 partitions.
Pre-Defined Functions¶
Here are the exercises to ensure our understanding related to Pre-Defined Functions.
We will use users table as well as other tables we got as part of retail database.
Information will be provided with each exercise.
%load_ext sql
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
%%sql
DROP TABLE IF EXISTS users
%%sql
CREATE TABLE users(
user_id SERIAL PRIMARY KEY,
user_first_name VARCHAR(30),
user_last_name VARCHAR(30),
user_email_id VARCHAR(50),
user_gender VARCHAR(1),
user_unique_id VARCHAR(15),
user_phone_no VARCHAR(20),
user_dob DATE,
created_ts TIMESTAMP
)
%%sql
insert into users (
user_first_name, user_last_name, user_email_id, user_gender,
user_unique_id, user_phone_no, user_dob, created_ts
) VALUES
('Giuseppe', 'Bode', 'gbode0@imgur.com', 'M', '88833-8759',
'+86 (764) 443-1967', '1973-05-31', '2018-04-15 12:13:38'),
('Lexy', 'Gisbey', 'lgisbey1@mail.ru', 'F', '262501-029',
'+86 (751) 160-3742', '2003-05-31', '2020-12-29 06:44:09'),
('Karel', 'Claringbold', 'kclaringbold2@yale.edu', 'F', '391-33-2823',
'+62 (445) 471-2682', '1985-11-28', '2018-11-19 00:04:08'),
('Marv', 'Tanswill', 'mtanswill3@dedecms.com', 'F', '1195413-80',
'+62 (497) 736-6802', '1998-05-24', '2018-11-19 16:29:43'),
('Gertie', 'Espinoza', 'gespinoza4@nationalgeographic.com', 'M', '471-24-6869',
'+249 (687) 506-2960', '1997-10-30', '2020-01-25 21:31:10'),
('Saleem', 'Danneil', 'sdanneil5@guardian.co.uk', 'F', '192374-933',
'+63 (810) 321-0331', '1992-03-08', '2020-11-07 19:01:14'),
('Rickert', 'O''Shiels', 'roshiels6@wikispaces.com', 'M', '749-27-47-52',
'+86 (184) 759-3933', '1972-11-01', '2018-03-20 10:53:24'),
('Cybil', 'Lissimore', 'clissimore7@pinterest.com', 'M', '461-75-4198',
'+54 (613) 939-6976', '1978-03-03', '2019-12-09 14:08:30'),
('Melita', 'Rimington', 'mrimington8@mozilla.org', 'F', '892-36-676-2',
'+48 (322) 829-8638', '1995-12-15', '2018-04-03 04:21:33'),
('Benetta', 'Nana', 'bnana9@google.com', 'M', '197-54-1646',
'+420 (934) 611-0020', '1971-12-07', '2018-10-17 21:02:51'),
('Gregorius', 'Gullane', 'ggullanea@prnewswire.com', 'F', '232-55-52-58',
'+62 (780) 859-1578', '1973-09-18', '2020-01-14 23:38:53'),
('Una', 'Glayzer', 'uglayzerb@pinterest.com', 'M', '898-84-336-6',
'+380 (840) 437-3981', '1983-05-26', '2019-09-17 03:24:21'),
('Jamie', 'Vosper', 'jvosperc@umich.edu', 'M', '247-95-68-44',
'+81 (205) 723-1942', '1972-03-18', '2020-07-23 16:39:33'),
('Calley', 'Tilson', 'ctilsond@issuu.com', 'F', '415-48-894-3',
'+229 (698) 777-4904', '1987-06-12', '2020-06-05 12:10:50'),
('Peadar', 'Gregorowicz', 'pgregorowicze@omniture.com', 'M', '403-39-5-869',
'+7 (267) 853-3262', '1996-09-21', '2018-05-29 23:51:31'),
('Jeanie', 'Webling', 'jweblingf@booking.com', 'F', '399-83-05-03',
'+351 (684) 413-0550', '1994-12-27', '2018-02-09 01:31:11'),
('Yankee', 'Jelf', 'yjelfg@wufoo.com', 'F', '607-99-0411',
'+1 (864) 112-7432', '1988-11-13', '2019-09-16 16:09:12'),
('Blair', 'Aumerle', 'baumerleh@toplist.cz', 'F', '430-01-578-5',
'+7 (393) 232-1860', '1979-11-09', '2018-10-28 19:25:35'),
('Pavlov', 'Steljes', 'psteljesi@macromedia.com', 'F', '571-09-6181',
'+598 (877) 881-3236', '1991-06-24', '2020-09-18 05:34:31'),
('Darn', 'Hadeke', 'dhadekej@last.fm', 'M', '478-32-02-87',
'+370 (347) 110-4270', '1984-09-04', '2018-02-10 12:56:00'),
('Wendell', 'Spanton', 'wspantonk@de.vu', 'F', null,
'+84 (301) 762-1316', '1973-07-24', '2018-01-30 01:20:11'),
('Carlo', 'Yearby', 'cyearbyl@comcast.net', 'F', null,
'+55 (288) 623-4067', '1974-11-11', '2018-06-24 03:18:40'),
('Sheila', 'Evitts', 'sevittsm@webmd.com', null, '830-40-5287',
null, '1977-03-01', '2020-07-20 09:59:41'),
('Sianna', 'Lowdham', 'slowdhamn@stanford.edu', null, '778-0845',
null, '1985-12-23', '2018-06-29 02:42:49'),
('Phylys', 'Aslie', 'paslieo@qq.com', 'M', '368-44-4478',
'+86 (765) 152-8654', '1984-03-22', '2019-10-01 01:34:28')
Exercise 1¶
Get all the number of users created per year.
Use users table for this exercise.
Output should contain 4 digit year and count.
Use date specific functions to get the year using created_ts.
Make sure you define aliases to the columns as created_year and user_count respectively.
Data should be sorted in ascending order by created_year.
When you run the query using Jupyter environment, it might have decimals for integers. Hence you can display results even with decimal points.
Here is the sample output.
created_year |
user_count |
---|---|
2018 |
13 |
2019 |
4 |
2020 |
8 |
Exercise 2¶
Get the day name of the birth days for all the users born in the month of June.
Use users table for this exercise.
Output should contain user_id, user_dob, user_email_id and user_day_of_birth.
Use date specific functions to get the month using user_dob.
user_day_of_birth should be full day with first character in upper case such as Tuesday
Data should be sorted by day with in the month of May.
user_id |
user_dob |
user_email_id |
user_day_of_birth |
---|---|---|---|
4 |
1998-05-24 |
mtanswill3@dedecms.com |
Sunday |
12 |
1983-05-26 |
uglayzerb@pinterest.com |
Thursday |
1 |
1973-05-31 |
gbode0@imgur.com |
Thursday |
2 |
2003-05-31 |
lgisbey1@mail.ru |
Saturday |
Exercise 3¶
Get the names and email ids of users added in year 2019.
Use users table for this exercise.
Output should contain user_id, user_name, user_email_id, created_ts, created_year.
Use date specific functions to get the year using created_ts.
user_name is a derived column by concatenating user_first_name and user_last_name with space in between.
user_name should have values in upper case.
Data should be sorted in ascending order by user_name
user_id |
user_name |
user_email_id |
created_ts |
created_year |
---|---|---|---|---|
8 |
CYBIL LISSIMORE |
clissimore7@pinterest.com |
2019-12-09 14:08:30 |
2019.0 |
25 |
PHYLYS ASLIE |
paslieo@qq.com |
2019-10-01 01:34:28 |
2019.0 |
12 |
UNA GLAYZER |
uglayzerb@pinterest.com |
2019-09-17 03:24:21 |
2019.0 |
17 |
YANKEE JELF |
yjelfg@wufoo.com |
2019-09-16 16:09:12 |
2019.0 |
Exercise 4¶
Get the number of users by gender.
Use users table for this exercise.
Output should contain gender and user_count.
For males the output should display Male and for females the output should display Female.
If gender is not specified, then it should display Not Specified.
Data should be sorted in descending order by user_count.
user_gender |
user_count |
---|---|
Female |
13 |
Male |
10 |
Not Specified |
2 |
Exercise 5¶
Get last 4 digits of unique ids.
Use users table for this exercise.
Output should contain user_id, user_unique_id and user_unique_id_last4
Unique ids are either null or not null.
Unique ids contain numbers and hyphens and are of different length.
We need to get last 4 digits discarding hyphens only when the number of digits are at least 9.
If unique id is null, then you should dispaly Not Specified.
After discarding hyphens, if unique id have less than 9 digits then you should display Invalid Unique Id.
Data should be sorted by user_id. You might see None or null for those user ids where there is no unique id for user_unique_id
user_id |
user_unique_id |
user_unique_id_last4 |
---|---|---|
1 |
88833-8759 |
8759 |
2 |
262501-029 |
1029 |
3 |
391-33-2823 |
2823 |
4 |
1195413-80 |
1380 |
5 |
471-24-6869 |
6869 |
6 |
192374-933 |
4933 |
7 |
749-27-47-52 |
4752 |
8 |
461-75-4198 |
4198 |
9 |
892-36-676-2 |
6762 |
10 |
197-54-1646 |
1646 |
11 |
232-55-52-58 |
5258 |
12 |
898-84-336-6 |
3366 |
13 |
247-95-68-44 |
6844 |
14 |
415-48-894-3 |
8943 |
15 |
403-39-5-869 |
5869 |
16 |
399-83-05-03 |
0503 |
17 |
607-99-0411 |
0411 |
18 |
430-01-578-5 |
5785 |
19 |
571-09-6181 |
6181 |
20 |
478-32-02-87 |
0287 |
21 |
Not Specified |
|
22 |
Not Specified |
|
23 |
830-40-5287 |
5287 |
24 |
778-0845 |
Invalid Unique Id |
25 |
368-44-4478 |
4478 |
Exercise 6¶
Get the count of users based up on country code.
Use users table for this exercise.
Output should contain country code and count.
There should be no
+
in the country code. It should only contain digits.Data should be sorted as numbers by country code.
We should discard user_phone_no with null values.
Here is the desired output:
country_code |
user_count |
---|---|
1 |
1 |
7 |
2 |
48 |
1 |
54 |
1 |
55 |
1 |
62 |
3 |
63 |
1 |
81 |
1 |
84 |
1 |
86 |
4 |
229 |
1 |
249 |
1 |
351 |
1 |
370 |
1 |
380 |
1 |
420 |
1 |
598 |
1 |
Exercise 7¶
Let us validate if we have invalid order_item_subtotal as part of order_items table.
order_items table have 6 fields.
order_item_id
order_item_order_id
order_item_product_id
order_item_quantity
order_item_subtotal
order_item_product_price
order_item_subtotal is nothing but product of order_item_quantity and order_item_product_price. It means order_item_subtotal is compute by multiplying order_item_quantity and order_item_product_price for each item.
You need to get the count of order_items where order_item_subtotal is not equal to the product of order_item_quantity and order_item_product_price.
There can be issues related to rounding off. Make sure it is taken care using appropriate function.
Output should be 0 as there are no such records.
count |
---|
0 |
Exercise 8¶
Get number of orders placed on weekdays and weekends in the month of January 2014.
orders have 4 fields
order_id
order_date
order_customer_id
order_status
Use order date to determine the day on which orders are placed.
Output should contain 2 columns - day_type and order_count.
day_type should have 2 values Week days and Weekend days.
Here is the desired output.
day_type |
order_count |
---|---|
Weekend days |
1505 |
Week days |
4403 |
Analytics Functions¶
Let us take care of the exercises related to analytics functions. We will be using HR database for the same.
Get all the employees who is making more than average salary with in each department.
Get cumulative salary for one of the department along with department name.
Get top 3 paid employees with in each department by salary (use dense_rank)
Get top 3 products sold in the month of 2014 January by revenue.
Get top 3 products in each category sold in the month of 2014 January by revenue.
Prepare HR Database¶
Here are the steps to prepare HR database.
Connect to HR DB using
psql
or SQL Workbench. Here is the samplepsql
command.
psql -h localhost \
-p 5432 \
-d itversity_hr_db \
-U itversity_hr_user \
-W
Run scripts to create tables and load the data. You can also drop the tables if they already exists.
\i /data/hr_db/drop_tables_pg.sql
\i /data/hr_db/create_tables_pg.sql
\i /data/hr_db/load_data_pg.sql
Validate to ensure that data is available in the tables by running these queries.
%load_ext sql
%env DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db
env: DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db
%sql SELECT * FROM employees LIMIT 10
* postgresql://itversity_hr_user:***@localhost:5432/itversity_hr_db
10 rows affected.
employee_id | first_name | last_name | phone_number | hire_date | job_id | salary | commission_pct | manager_id | department_id | |
---|---|---|---|---|---|---|---|---|---|---|
100 | Steven | King | SKING | 515.123.4567 | 1987-06-17 | AD_PRES | 24000.00 | None | None | 90 |
101 | Neena | Kochhar | NKOCHHAR | 515.123.4568 | 1989-09-21 | AD_VP | 17000.00 | None | 100 | 90 |
102 | Lex | De Haan | LDEHAAN | 515.123.4569 | 1993-01-13 | AD_VP | 17000.00 | None | 100 | 90 |
103 | Alexander | Hunold | AHUNOLD | 590.423.4567 | 1990-01-03 | IT_PROG | 9000.00 | None | 102 | 60 |
104 | Bruce | Ernst | BERNST | 590.423.4568 | 1991-05-21 | IT_PROG | 6000.00 | None | 103 | 60 |
105 | David | Austin | DAUSTIN | 590.423.4569 | 1997-06-25 | IT_PROG | 4800.00 | None | 103 | 60 |
106 | Valli | Pataballa | VPATABAL | 590.423.4560 | 1998-02-05 | IT_PROG | 4800.00 | None | 103 | 60 |
107 | Diana | Lorentz | DLORENTZ | 590.423.5567 | 1999-02-07 | IT_PROG | 4200.00 | None | 103 | 60 |
108 | Nancy | Greenberg | NGREENBE | 515.124.4569 | 1994-08-17 | FI_MGR | 12000.00 | None | 101 | 100 |
109 | Daniel | Faviet | DFAVIET | 515.124.4169 | 1994-08-16 | FI_ACCOUNT | 9000.00 | None | 108 | 100 |
%%sql
SELECT * FROM departments
ORDER BY manager_id NULLS LAST
LIMIT 10
* postgresql://itversity_hr_user:***@localhost:5432/itversity_hr_db
10 rows affected.
department_id | department_name | manager_id | location_id |
---|---|---|---|
90 | Executive | 100 | 1700 |
60 | IT | 103 | 1400 |
100 | Finance | 108 | 1700 |
30 | Purchasing | 114 | 1700 |
50 | Shipping | 121 | 1500 |
80 | Sales | 145 | 2500 |
10 | Administration | 200 | 1700 |
20 | Marketing | 201 | 1800 |
40 | Human Resources | 203 | 2400 |
70 | Public Relations | 204 | 2700 |
Prepare Retail Database¶
Make sure to drop and recreate the tables before taking care of the exercises related to retail database.
Ensure that we have required database and user for retail data. We might provide the database as part of our labs. Here are the instructions to use
psql
for setting up the required database (if required) and tables.
psql -U postgres -h localhost -p 5432 -W
CREATE DATABASE itversity_retail_db;
CREATE USER itversity_retail_user WITH ENCRYPTED PASSWORD 'retail_password';
GRANT ALL ON DATABASE itversity_retail_db TO itversity_retail_user;
Create Tables using the script provided. You can either use
psql
or SQL Workbench.
psql -U itversity_retail_user \
-h localhost \
-p 5432 \
-d itversity_retail_db \
-W
You can drop the existing tables.
DROP TABLE IF EXISTS order_items CASCADE;
DROP TABLE IF EXISTS orders CASCADE;
DROP TABLE IF EXISTS customers CASCADE;
DROP TABLE IF EXISTS products CASCADE;
DROP TABLE IF EXISTS categories CASCADE;
DROP TABLE IF EXISTS departments CASCADE;
Once the tables are dropped you can run below script to create the tables for the purpose of exercises.
\i /data/retail_db/create_db_tables_pg.sql
Data shall be loaded using the script provided.
\i /data/retail_db/load_db_tables_pg.sql
Run queries to validate we have data in all the 3 tables.
%load_ext sql
The sql extension is already loaded. To reload it, use:
%reload_ext sql
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
%sql SELECT * FROM orders LIMIT 10
10 rows affected.
order_id | order_date | order_customer_id | order_status |
---|---|---|---|
1 | 2013-07-25 00:00:00 | 11599 | CLOSED |
2 | 2013-07-25 00:00:00 | 256 | PENDING_PAYMENT |
3 | 2013-07-25 00:00:00 | 12111 | COMPLETE |
4 | 2013-07-25 00:00:00 | 8827 | CLOSED |
5 | 2013-07-25 00:00:00 | 11318 | COMPLETE |
6 | 2013-07-25 00:00:00 | 7130 | COMPLETE |
7 | 2013-07-25 00:00:00 | 4530 | COMPLETE |
8 | 2013-07-25 00:00:00 | 2911 | PROCESSING |
9 | 2013-07-25 00:00:00 | 5657 | PENDING_PAYMENT |
10 | 2013-07-25 00:00:00 | 5648 | PENDING_PAYMENT |
%sql SELECT count(1) FROM orders
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.
count |
---|
68883 |
%sql SELECT * FROM order_items LIMIT 10
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.
order_item_id | order_item_order_id | order_item_product_id | order_item_quantity | order_item_subtotal | order_item_product_price |
---|---|---|---|---|---|
1 | 1 | 957 | 1 | 299.98 | 299.98 |
2 | 2 | 1073 | 1 | 199.99 | 199.99 |
3 | 2 | 502 | 5 | 250.0 | 50.0 |
4 | 2 | 403 | 1 | 129.99 | 129.99 |
5 | 4 | 897 | 2 | 49.98 | 24.99 |
6 | 4 | 365 | 5 | 299.95 | 59.99 |
7 | 4 | 502 | 3 | 150.0 | 50.0 |
8 | 4 | 1014 | 4 | 199.92 | 49.98 |
9 | 5 | 957 | 1 | 299.98 | 299.98 |
10 | 5 | 365 | 5 | 299.95 | 59.99 |
%sql SELECT count(1) FROM order_items
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.
count |
---|
172198 |
%sql SELECT * FROM products LIMIT 10
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.
product_id | product_category_id | product_name | product_description | product_price | product_image |
---|---|---|---|---|---|
1 | 2 | Quest Q64 10 FT. x 10 FT. Slant Leg Instant U | 59.98 | http://images.acmesports.sports/Quest+Q64+10+FT.+x+10+FT.+Slant+Leg+Instant+Up+Canopy | |
2 | 2 | Under Armour Men's Highlight MC Football Clea | 129.99 | http://images.acmesports.sports/Under+Armour+Men%27s+Highlight+MC+Football+Cleat | |
3 | 2 | Under Armour Men's Renegade D Mid Football Cl | 89.99 | http://images.acmesports.sports/Under+Armour+Men%27s+Renegade+D+Mid+Football+Cleat | |
4 | 2 | Under Armour Men's Renegade D Mid Football Cl | 89.99 | http://images.acmesports.sports/Under+Armour+Men%27s+Renegade+D+Mid+Football+Cleat | |
5 | 2 | Riddell Youth Revolution Speed Custom Footbal | 199.99 | http://images.acmesports.sports/Riddell+Youth+Revolution+Speed+Custom+Football+Helmet | |
6 | 2 | Jordan Men's VI Retro TD Football Cleat | 134.99 | http://images.acmesports.sports/Jordan+Men%27s+VI+Retro+TD+Football+Cleat | |
7 | 2 | Schutt Youth Recruit Hybrid Custom Football H | 99.99 | http://images.acmesports.sports/Schutt+Youth+Recruit+Hybrid+Custom+Football+Helmet+2014 | |
8 | 2 | Nike Men's Vapor Carbon Elite TD Football Cle | 129.99 | http://images.acmesports.sports/Nike+Men%27s+Vapor+Carbon+Elite+TD+Football+Cleat | |
9 | 2 | Nike Adult Vapor Jet 3.0 Receiver Gloves | 50.0 | http://images.acmesports.sports/Nike+Adult+Vapor+Jet+3.0+Receiver+Gloves | |
10 | 2 | Under Armour Men's Highlight MC Football Clea | 129.99 | http://images.acmesports.sports/Under+Armour+Men%27s+Highlight+MC+Football+Cleat |
%sql SELECT count(1) FROM products
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.
count |
---|
1345 |
Exercise 1¶
Get all the employees who is making more than average salary with in each department.
Use HR database employees and department tables for this problem.
Compute average salary expense for each department and get those employee details who are making more salary than average salary.
Make sure average salary expense per department is rounded off to 2 decimals.
Output should contain employee_id, department_name, salary and avg_salary_expense (derived field).
Data should be sorted in ascending order by department_id and descending order by salary.
employee_id |
department_name |
salary |
avg_salary_expense |
---|---|---|---|
201 |
Marketing |
13000.00 |
9500.00 |
114 |
Purchasing |
11000.00 |
4150.00 |
121 |
Shipping |
8200.00 |
3475.56 |
120 |
Shipping |
8000.00 |
3475.56 |
122 |
Shipping |
7900.00 |
3475.56 |
123 |
Shipping |
6500.00 |
3475.56 |
124 |
Shipping |
5800.00 |
3475.56 |
184 |
Shipping |
4200.00 |
3475.56 |
185 |
Shipping |
4100.00 |
3475.56 |
192 |
Shipping |
4000.00 |
3475.56 |
193 |
Shipping |
3900.00 |
3475.56 |
188 |
Shipping |
3800.00 |
3475.56 |
137 |
Shipping |
3600.00 |
3475.56 |
189 |
Shipping |
3600.00 |
3475.56 |
141 |
Shipping |
3500.00 |
3475.56 |
103 |
IT |
9000.00 |
5760.00 |
104 |
IT |
6000.00 |
5760.00 |
145 |
Sales |
14000.00 |
8955.88 |
146 |
Sales |
13500.00 |
8955.88 |
147 |
Sales |
12000.00 |
8955.88 |
168 |
Sales |
11500.00 |
8955.88 |
148 |
Sales |
11000.00 |
8955.88 |
174 |
Sales |
11000.00 |
8955.88 |
149 |
Sales |
10500.00 |
8955.88 |
162 |
Sales |
10500.00 |
8955.88 |
156 |
Sales |
10000.00 |
8955.88 |
150 |
Sales |
10000.00 |
8955.88 |
169 |
Sales |
10000.00 |
8955.88 |
170 |
Sales |
9600.00 |
8955.88 |
163 |
Sales |
9500.00 |
8955.88 |
151 |
Sales |
9500.00 |
8955.88 |
157 |
Sales |
9500.00 |
8955.88 |
158 |
Sales |
9000.00 |
8955.88 |
152 |
Sales |
9000.00 |
8955.88 |
100 |
Executive |
24000.00 |
19333.33 |
108 |
Finance |
12000.00 |
8600.00 |
109 |
Finance |
9000.00 |
8600.00 |
205 |
Accounting |
12000.00 |
10150.00 |
%load_ext sql
%env DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db
env: DATABASE_URL=postgresql://itversity_hr_user:hr_password@localhost:5432/itversity_hr_db
Exercise 2¶
Get cumulative salary with in each department for Finance and IT department along with department name.
Use HR database employees and department tables for this problem.
Compute cumulative salary expense for Finance as well as IT departments with in respective departments.
Make sure cumulative salary expense per department is rounded off to 2 decimals.
Output should contain employee_id, department_name, salary and cum_salary_expense (derived field).
Data should be sorted in ascending order by department_name and then salary.
employee_id |
department_name |
salary |
cum_salary_expense |
---|---|---|---|
113 |
Finance |
6900.00 |
6900.00 |
111 |
Finance |
7700.00 |
14600.00 |
112 |
Finance |
7800.00 |
22400.00 |
110 |
Finance |
8200.00 |
30600.00 |
109 |
Finance |
9000.00 |
39600.00 |
108 |
Finance |
12000.00 |
51600.00 |
107 |
IT |
4200.00 |
4200.00 |
106 |
IT |
4800.00 |
9000.00 |
105 |
IT |
4800.00 |
13800.00 |
104 |
IT |
6000.00 |
19800.00 |
103 |
IT |
9000.00 |
28800.00 |
Exercise 3¶
Get top 3 paid employees with in each department by salary (use dense_rank)
Use HR database employees and department tables for this problem.
Highest paid employee should be ranked first.
Output should contain employee_id, department_id, department_name, salary and employee_rank (derived field).
Data should be sorted in ascending order by department_id in ascending order and then salary in descending order.
employee_id |
department_id |
department_name |
salary |
employee_rank |
---|---|---|---|---|
200 |
10 |
Administration |
4400.00 |
1 |
201 |
20 |
Marketing |
13000.00 |
1 |
202 |
20 |
Marketing |
6000.00 |
2 |
114 |
30 |
Purchasing |
11000.00 |
1 |
115 |
30 |
Purchasing |
3100.00 |
2 |
116 |
30 |
Purchasing |
2900.00 |
3 |
203 |
40 |
Human Resources |
6500.00 |
1 |
121 |
50 |
Shipping |
8200.00 |
1 |
120 |
50 |
Shipping |
8000.00 |
2 |
122 |
50 |
Shipping |
7900.00 |
3 |
103 |
60 |
IT |
9000.00 |
1 |
104 |
60 |
IT |
6000.00 |
2 |
105 |
60 |
IT |
4800.00 |
3 |
106 |
60 |
IT |
4800.00 |
3 |
204 |
70 |
Public Relations |
10000.00 |
1 |
145 |
80 |
Sales |
14000.00 |
1 |
146 |
80 |
Sales |
13500.00 |
2 |
147 |
80 |
Sales |
12000.00 |
3 |
100 |
90 |
Executive |
24000.00 |
1 |
101 |
90 |
Executive |
17000.00 |
2 |
102 |
90 |
Executive |
17000.00 |
2 |
108 |
100 |
Finance |
12000.00 |
1 |
109 |
100 |
Finance |
9000.00 |
2 |
110 |
100 |
Finance |
8200.00 |
3 |
205 |
110 |
Accounting |
12000.00 |
1 |
206 |
110 |
Accounting |
8300.00 |
2 |
Exercise 4¶
Get top 3 products sold in the month of 2014 January by revenue.
Use retail database tables such as orders, order_items and products.
Highest revenue generating product should come at top.
Consider only those orders which are either in COMPLETE or CLOSED status.
Output should contain product_id, product_name, revenue, product_rank. revenue and product_rank are derived fields.
Data should be sorted in descending order by revenue.
product_id |
product_name |
revenue |
product_rank |
---|---|---|---|
1004 |
Field & Stream Sportsman 16 Gun Fire Safe |
250787.46 |
1 |
365 |
Perfect Fitness Perfect Rip Deck |
151474.75 |
2 |
957 |
Diamondback Women’s Serene Classic Comfort Bi |
148190.12 |
3 |
Exercise 5¶
Get top 3 products sold in the month of 2014 January under selected categories by revenue. The categories are Cardio Equipment and Strength Training.
Use retail database tables such as orders, order_items, products as well as categories.
Highest revenue generating product should come at top.
Consider only those orders which are either in COMPLETE or CLOSED status.
Output should contain category_id, category_name, product_id, product_name, revenue, product_rank. revenue and product_rank are derived fields.
Data should be sorted in ascending order by category_id and descending order by revenue.
category_id |
category_name |
product_id |
product_name |
revenue |
product_rank |
---|---|---|---|---|---|
9 |
Cardio Equipment |
191 |
Nike Men’s Free 5.0+ Running Shoe |
132286.77 |
1 |
9 |
Cardio Equipment |
172 |
Nike Women’s Tempo Shorts |
870.00 |
2 |
10 |
Strength Training |
208 |
SOLE E35 Elliptical |
1999.99 |
1 |
10 |
Strength Training |
203 |
GoPro HERO3+ Black Edition Camera |
1199.97 |
2 |
10 |
Strength Training |
216 |
Yakima DoubleDown Ace Hitch Mount 4-Bike Rack |
189.00 |
3 |