Advanced DML OperationsΒΆ

As we gain enough knowledge related to writing queries, let us explore some advanced DML Operations.

  • We can insert query results into a table using INSERT with SELECT.

  • As long as columns specified for table in INSERT statement and columns projected in SELECT clause match, it works.

  • We can also use query results for UPDATE as well as DELETE.

%load_ext sql
The sql extension is already loaded. To reload it, use:
  %reload_ext sql
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

Note

Creating customer order metrics table to demonstrate advanced DML Operations. We will also add primary key to this table. We will be storing number of orders placed and revenue generated for each customer in a given month.

%%sql

DROP TABLE IF EXISTS customer_order_metrics_mthly
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
Done.
[]
%%sql

CREATE TABLE customer_order_metrics_mthly (
    customer_id INT,
    order_month CHAR(7),
    order_count INT,
    order_revenue FLOAT
)
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
Done.
[]
%%sql

ALTER TABLE customer_order_metrics_mthly
    ADD PRIMARY KEY (order_month, customer_id)
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
Done.
[]

Note

Here is the query to get monthly customer orders metrics. First we will be inserting customer_id, order_month and order_count into the table.

Warning

If the below query is run multiple times, every time data in both orders and order_items need to be processed. As the data volumes grow the query uses considerable amount of resources. It will be better if we can pre-aggregate the data.

%%sql

SELECT o.order_customer_id,
    to_char(o.order_date, 'yyyy-MM') AS order_month,
    count(1) AS order_count,
    round(sum(order_item_subtotal)::numeric, 2) AS order_revenue
FROM orders o 
    JOIN order_items oi
        ON o.order_id = oi.order_item_order_id
GROUP BY o.order_customer_id,
    to_char(o.order_date, 'yyyy-MM')
ORDER BY order_month,
    order_count DESC
LIMIT 10
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.
order_customer_id order_month order_count order_revenue
4257 2013-07 10 2059.75
5293 2013-07 10 2781.73
9103 2013-07 9 1587.85
7473 2013-07 9 1244.90
2071 2013-07 9 1629.84
32 2013-07 9 2009.75
488 2013-07 9 1365.82
7073 2013-07 9 1377.83
8709 2013-07 8 1349.87
1498 2013-07 8 1619.88

Warning

Here are the number of records that need to be processed every time. Also it involves expensive join.

%%sql

SELECT count(1)
FROM orders o 
    JOIN order_items oi
        ON o.order_id = oi.order_item_order_id
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.
count
172198

Note

Let us first insert the data into the table with out revenue. We will update the revenue later as an example for updating using query results.

%%sql

INSERT INTO customer_order_metrics_mthly
SELECT o.order_customer_id,
    to_char(o.order_date, 'yyyy-MM') AS order_month,
    count(1) order_count,
    NULL
FROM orders o 
    JOIN order_items oi
        ON o.order_id = oi.order_item_order_id
GROUP BY o.order_customer_id,
    to_char(o.order_date, 'yyyy-MM')
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
48059 rows affected.
[]
%%sql

SELECT * FROM customer_order_metrics_mthly
ORDER BY order_month,
    customer_id
LIMIT 10
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.
customer_id order_month order_count order_revenue
12 2013-07 2 None
16 2013-07 1 None
17 2013-07 2 None
19 2013-07 3 None
32 2013-07 9 None
45 2013-07 4 None
48 2013-07 4 None
54 2013-07 2 None
58 2013-07 4 None
64 2013-07 2 None

Note

Updating order_revenue along with count. This is expensive operation, but we will be running only once.

%%sql

UPDATE customer_order_metrics_mthly comd
SET 
    (order_count, order_revenue) = (
        SELECT count(1),
            round(sum(order_item_subtotal)::numeric, 2)
        FROM orders o 
            JOIN order_items oi
                ON o.order_id = oi.order_item_order_id
        WHERE o.order_customer_id = comd.customer_id
            AND to_char(o.order_date, 'yyyy-MM') = comd.order_month
            AND to_char(o.order_date, 'yyyy-MM') = '2013-08'
            AND comd.order_month = '2013-08'
        GROUP BY o.order_customer_id,
            to_char(o.order_date, 'yyyy-MM')
    )
WHERE EXISTS (
    SELECT 1 FROM orders o
    WHERE o.order_customer_id = comd.customer_id
        AND to_char(o.order_date, 'yyyy-MM') = comd.order_month
        AND to_char(o.order_date, 'yyyy-MM') = '2013-08'
) AND comd.order_month = '2013-08'
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
3935 rows affected.
[]

Note

As data is pre processed and loaded into the table, queries similar to below ones against customer_order_metrics_mthly will run much faster.

We need to process lesser amount of data with out expensive join.

%%sql

SELECT * FROM customer_order_metrics_mthly
WHERE order_month = '2013-08'
ORDER BY order_month,
    customer_id
LIMIT 10
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.
customer_id order_month order_count order_revenue
2 2013-08 5 769.82
13 2013-08 5 1065.93
14 2013-08 3 459.97
18 2013-08 1 129.99
20 2013-08 2 739.91
22 2013-08 5 769.96
24 2013-08 2 399.91
25 2013-08 1 129.99
33 2013-08 3 929.92
34 2013-08 4 789.92

Note

As an example for delete using query, we will delete all the dormant customers from customers table. Dormant customers are those customers who never placed any order. For this we will create back up customers table as I do not want to play with customers.

%%sql

DROP TABLE IF EXISTS customers_backup
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
Done.
[]
%%sql

CREATE TABLE customers_backup
AS
SELECT * FROM customers
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
12435 rows affected.
[]
%%sql

SELECT count(1) FROM customers_backup c
    LEFT OUTER JOIN orders o
        ON c.customer_id = o.order_customer_id
WHERE o.order_customer_id IS NULL
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.
count
30
%%sql

SELECT count(1) FROM customers_backup c
WHERE NOT EXISTS (
    SELECT 1 FROM orders o
    WHERE c.customer_id = o.order_customer_id
)
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.
count
30

Note

We need to use nested sub queries as part of the delete with NOT EXISTS or NOT IN as demonstrated below. We cannot use direct joins as part of the DELETE.

%%sql

DELETE FROM customers_backup c
WHERE NOT EXISTS (
    SELECT 1 FROM orders o
    WHERE c.customer_id = o.order_customer_id
)
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
30 rows affected.
[]
%%sql

SELECT count(1) FROM customers_backup
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.
count
12405
%%sql

DELETE FROM customers_backup c
WHERE customer_id NOT IN (
    SELECT order_customer_id FROM orders o
)
 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
0 rows affected.
[]