Criteria for Indexing¶
Let us go through some of the criteria to create indexes on tables.
Indexes are required to enforce constraints such as Primary Key, Unique etc. Indexes will be automatically created, when we define a column(s) Primary Key or Unique.
Too many indexes on a given table, can slow down the performance of inserts, updates and deletes on that table. Hence, you need to make sure to strike right balance by creating indexes only when they are required.
Thorough analysis need to be done about how the queries will hit the table from the application.
We might have to create indexes on foreign key columns of the child table.
When we have tables with multiple parents, we need to be due diligent about how the index should be created.
Shall we create 2 indexes?
Shall we create 1 index with both the columns pointing to 2 tables?
If we want to create 1 index with both the columns what should be the order?
Here are some of the scenarios from the application perspective based upon which we can consider creating indexes.
Customer checking all his orders.
We need to get the data from orders using customer id and hence we need to add index on orders.order_customer_id.
Customer checking order details for a given order which include order_item_subtotal as well as product names.
We need to join orders, order_items as well as products.
order_items is child table for both orders and products.
We can create composite index on order_items.order_item_order_id and order_items.order_item_product_id.
Customer care executive to check all the order details placed by customer using at least first 3 characters of customer’s first name.
We can consider creating index on customers.customer_fname using upper or lower. You can also consider adding customer_id to the index along with customer_fname.
Also to get all the order details for a given customer, we have to ensure that there is an index on orders.order_customer_id.
%load_ext sql
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db
%%sql
DROP INDEX order_items_order_id_idx
%%sql
SELECT min(customer_id), max(customer_id), count(1)
FROM customers
import psycopg2
%%time
from random import randrange
connection = psycopg2.connect(
host='localhost',
port='5432',
database='itversity_retail_db',
user='itversity_retail_user',
password='retail_password'
)
cursor = connection.cursor()
query = '''SELECT count(1)
FROM orders o
WHERE order_customer_id = %s
'''
ctr = 0
while True:
if ctr == 2000:
break
customer_id = randrange(10950, 12435)
cursor.execute(query, (customer_id,))
ctr += 1
cursor.close()
connection.close()
%%sql
CREATE INDEX orders_customer_id_idx
ON orders(order_customer_id)
%%time
from random import randrange
connection = psycopg2.connect(
host='localhost',
port='5432',
database='itversity_retail_db',
user='itversity_retail_user',
password='retail_password'
)
cursor = connection.cursor()
query = '''SELECT count(1)
FROM orders o
WHERE order_customer_id = %s
'''
ctr = 0
while True:
if ctr == 2000:
break
customer_id = randrange(10950, 12435)
cursor.execute(query, (customer_id,))
ctr += 1
cursor.close()
connection.close()
%%time
from random import randrange
connection = psycopg2.connect(
host='localhost',
port='5432',
database='itversity_retail_db',
user='itversity_retail_user',
password='retail_password'
)
cursor = connection.cursor()
query = '''SELECT count(1)
FROM orders o
JOIN order_items oi
ON o.order_id = oi.order_item_order_id
JOIN products p
ON p.product_id = oi.order_item_product_id
WHERE order_id = %s
'''
ctr = 0
while True:
if ctr == 2000:
break
order_id = randrange(1, 68883)
cursor.execute(query, (order_id,))
ctr += 1
cursor.close()
connection.close()
%%sql
CREATE INDEX order_items_oid_pid_idx
ON order_items(order_item_order_id, order_item_product_id);
%%time
from random import randrange
connection = psycopg2.connect(
host='localhost',
port='5432',
database='itversity_retail_db',
user='itversity_retail_user',
password='retail_password'
)
cursor = connection.cursor()
query = '''SELECT count(1)
FROM orders o
JOIN order_items oi
ON o.order_id = oi.order_item_order_id
JOIN products p
ON p.product_id = oi.order_item_product_id
WHERE order_id = %s
'''
ctr = 0
while True:
if ctr == 2000:
break
order_id = randrange(1, 68883)
cursor.execute(query, (order_id,))
ctr += 1
cursor.close()
connection.close()
Note
As our products table only have handful of records there will not be significant difference in performance between the 2 approaches.
Index on order_items.order_item_order_id
Index on order_items.order_item_order_id, order_items.order_item_product_id
Howeever if you create index using product id as driving field then the performance will not be as good as above 2 approaches.
%%sql
DROP INDEX order_items_oid_pid_idx
%%sql
CREATE INDEX order_items_pid_oid_idx
ON order_items(order_item_product_id, order_item_order_id);
%%time
from random import randrange
connection = psycopg2.connect(
host='localhost',
port='5432',
database='itversity_retail_db',
user='itversity_retail_user',
password='retail_password'
)
cursor = connection.cursor()
query = '''SELECT count(1)
FROM orders o
JOIN order_items oi
ON o.order_id = oi.order_item_order_id
JOIN products p
ON p.product_id = oi.order_item_product_id
WHERE order_id = %s
'''
ctr = 0
while True:
if ctr == 2000:
break
order_id = randrange(1, 68883)
cursor.execute(query, (order_id,))
ctr += 1
cursor.close()
connection.close()
Note
Here are the indexes to tune the performance of comparing with at least first 3 characters of customer first name.
%%sql
DROP INDEX IF EXISTS orders_customer_id_idx
%%sql
DROP INDEX IF EXISTS customers_customer_fname_idx
Explain plan for query with out indexes.
EXPLAIN
SELECT *
FROM orders o JOIN customers c
ON o.order_customer_id = c.customer_id
WHERE upper(c.customer_fname) = upper('mar');
QUERY PLAN
-------------------------------------------------------------------------
Hash Join (cost=42.38..1437.09 rows=40 width=99)
Hash Cond: (o.order_customer_id = c.customer_id)
-> Seq Scan on orders o (cost=0.00..1213.83 rows=68883 width=26)
-> Hash (cost=42.29..42.29 rows=7 width=73)
-> Seq Scan on customers c (cost=0.00..42.29 rows=7 width=73)
Filter: (upper((customer_fname)::text) = 'MAR'::text)
(6 rows)
%%sql
CREATE INDEX customers_customer_fname_idx
ON customers(upper(customer_fname))
%%sql
CREATE INDEX orders_customer_id_idx
ON orders(order_customer_id)
Explain plan for query with indexes. Check the cost, it is significantly low when compared to the plan generated for the same query with out indexes.
EXPLAIN
SELECT *
FROM orders o JOIN customers c
ON o.order_customer_id = c.customer_id
WHERE upper(c.customer_fname) = upper('mar');
QUERY PLAN
-------------------------------------------------------------------------------------------------
Nested Loop (cost=8.67..204.43 rows=40 width=99)
-> Bitmap Heap Scan on customers c (cost=4.33..18.58 rows=7 width=73)
Recheck Cond: (upper((customer_fname)::text) = 'MAR'::text)
-> Bitmap Index Scan on customers_customer_fname_idx (cost=0.00..4.33 rows=7 width=0)
Index Cond: (upper((customer_fname)::text) = 'MAR'::text)
-> Bitmap Heap Scan on orders o (cost=4.34..26.49 rows=6 width=26)
Recheck Cond: (order_customer_id = c.customer_id)
-> Bitmap Index Scan on orders_customer_id_idx (cost=0.00..4.34 rows=6 width=0)
Index Cond: (order_customer_id = c.customer_id)
(9 rows)