Advanced DML OperationsΒΆ
As we gain enough knowledge related to writing queries, let us explore some advanced DML Operations.
We can insert query results into a table using
INSERT
withSELECT
.As long as columns specified for table in
INSERT
statement and columns projected inSELECT
clause match, it works.We can also use query results for
UPDATE
as well asDELETE
.
%load_ext sql
The sql extension is already loaded. To reload it, use:
%reload_ext sql
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
Note
Creating customer order metrics table to demonstrate advanced DML Operations. We will also add primary key to this table. We will be storing number of orders placed and revenue generated for each customer in a given month.
%%sql
DROP TABLE IF EXISTS customer_order_metrics_mthly
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
Done.
[]
%%sql
CREATE TABLE customer_order_metrics_mthly (
customer_id INT,
order_month CHAR(7),
order_count INT,
order_revenue FLOAT
)
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
Done.
[]
%%sql
ALTER TABLE customer_order_metrics_mthly
ADD PRIMARY KEY (order_month, customer_id)
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
Done.
[]
Note
Here is the query to get monthly customer orders metrics. First we will be inserting customer_id, order_month and order_count into the table.
Warning
If the below query is run multiple times, every time data in both orders and order_items need to be processed. As the data volumes grow the query uses considerable amount of resources. It will be better if we can pre-aggregate the data.
%%sql
SELECT o.order_customer_id,
to_char(o.order_date, 'yyyy-MM') AS order_month,
count(1) AS order_count,
round(sum(order_item_subtotal)::numeric, 2) AS order_revenue
FROM orders o
JOIN order_items oi
ON o.order_id = oi.order_item_order_id
GROUP BY o.order_customer_id,
to_char(o.order_date, 'yyyy-MM')
ORDER BY order_month,
order_count DESC
LIMIT 10
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.
order_customer_id | order_month | order_count | order_revenue |
---|---|---|---|
4257 | 2013-07 | 10 | 2059.75 |
5293 | 2013-07 | 10 | 2781.73 |
9103 | 2013-07 | 9 | 1587.85 |
7473 | 2013-07 | 9 | 1244.90 |
2071 | 2013-07 | 9 | 1629.84 |
32 | 2013-07 | 9 | 2009.75 |
488 | 2013-07 | 9 | 1365.82 |
7073 | 2013-07 | 9 | 1377.83 |
8709 | 2013-07 | 8 | 1349.87 |
1498 | 2013-07 | 8 | 1619.88 |
Warning
Here are the number of records that need to be processed every time. Also it involves expensive join.
%%sql
SELECT count(1)
FROM orders o
JOIN order_items oi
ON o.order_id = oi.order_item_order_id
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.
count |
---|
172198 |
Note
Let us first insert the data into the table with out revenue. We will update the revenue later as an example for updating using query results.
%%sql
INSERT INTO customer_order_metrics_mthly
SELECT o.order_customer_id,
to_char(o.order_date, 'yyyy-MM') AS order_month,
count(1) order_count,
NULL
FROM orders o
JOIN order_items oi
ON o.order_id = oi.order_item_order_id
GROUP BY o.order_customer_id,
to_char(o.order_date, 'yyyy-MM')
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
48059 rows affected.
[]
%%sql
SELECT * FROM customer_order_metrics_mthly
ORDER BY order_month,
customer_id
LIMIT 10
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.
customer_id | order_month | order_count | order_revenue |
---|---|---|---|
12 | 2013-07 | 2 | None |
16 | 2013-07 | 1 | None |
17 | 2013-07 | 2 | None |
19 | 2013-07 | 3 | None |
32 | 2013-07 | 9 | None |
45 | 2013-07 | 4 | None |
48 | 2013-07 | 4 | None |
54 | 2013-07 | 2 | None |
58 | 2013-07 | 4 | None |
64 | 2013-07 | 2 | None |
Note
Updating order_revenue along with count. This is expensive operation, but we will be running only once.
%%sql
UPDATE customer_order_metrics_mthly comd
SET
(order_count, order_revenue) = (
SELECT count(1),
round(sum(order_item_subtotal)::numeric, 2)
FROM orders o
JOIN order_items oi
ON o.order_id = oi.order_item_order_id
WHERE o.order_customer_id = comd.customer_id
AND to_char(o.order_date, 'yyyy-MM') = comd.order_month
AND to_char(o.order_date, 'yyyy-MM') = '2013-08'
AND comd.order_month = '2013-08'
GROUP BY o.order_customer_id,
to_char(o.order_date, 'yyyy-MM')
)
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.order_customer_id = comd.customer_id
AND to_char(o.order_date, 'yyyy-MM') = comd.order_month
AND to_char(o.order_date, 'yyyy-MM') = '2013-08'
) AND comd.order_month = '2013-08'
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
3935 rows affected.
[]
Note
As data is pre processed and loaded into the table, queries similar to below ones against customer_order_metrics_mthly will run much faster.
We need to process lesser amount of data with out expensive join.
%%sql
SELECT * FROM customer_order_metrics_mthly
WHERE order_month = '2013-08'
ORDER BY order_month,
customer_id
LIMIT 10
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.
customer_id | order_month | order_count | order_revenue |
---|---|---|---|
2 | 2013-08 | 5 | 769.82 |
13 | 2013-08 | 5 | 1065.93 |
14 | 2013-08 | 3 | 459.97 |
18 | 2013-08 | 1 | 129.99 |
20 | 2013-08 | 2 | 739.91 |
22 | 2013-08 | 5 | 769.96 |
24 | 2013-08 | 2 | 399.91 |
25 | 2013-08 | 1 | 129.99 |
33 | 2013-08 | 3 | 929.92 |
34 | 2013-08 | 4 | 789.92 |
Note
As an example for delete using query, we will delete all the dormant customers from customers table. Dormant customers are those customers who never placed any order. For this we will create back up customers table as I do not want to play with customers.
%%sql
DROP TABLE IF EXISTS customers_backup
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
Done.
[]
%%sql
CREATE TABLE customers_backup
AS
SELECT * FROM customers
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
12435 rows affected.
[]
%%sql
SELECT count(1) FROM customers_backup c
LEFT OUTER JOIN orders o
ON c.customer_id = o.order_customer_id
WHERE o.order_customer_id IS NULL
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.
count |
---|
30 |
%%sql
SELECT count(1) FROM customers_backup c
WHERE NOT EXISTS (
SELECT 1 FROM orders o
WHERE c.customer_id = o.order_customer_id
)
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.
count |
---|
30 |
Note
We need to use nested sub queries as part of the delete with NOT EXISTS
or NOT IN
as demonstrated below. We cannot use direct joins as part of the DELETE
.
%%sql
DELETE FROM customers_backup c
WHERE NOT EXISTS (
SELECT 1 FROM orders o
WHERE c.customer_id = o.order_customer_id
)
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
30 rows affected.
[]
%%sql
SELECT count(1) FROM customers_backup
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.
count |
---|
12405 |
%%sql
DELETE FROM customers_backup c
WHERE customer_id NOT IN (
SELECT order_customer_id FROM orders o
)
* postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
0 rows affected.
[]