{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Analytic Functions - Filtering\n",
"\n",
"Let us go through the solution for getting top 5 daily products based up on the revenue. In that process we will understand how to apply filtering on top of the derived values using analytic functions."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"tags": [
"remove-cell"
]
},
"outputs": [
{
"data": {
"text/html": [
"VIDEO \n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%HTML\n",
"VIDEO "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The sql extension is already loaded. To reload it, use:\n",
" %reload_ext sql\n"
]
}
],
"source": [
"%load_ext sql"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db\n"
]
}
],
"source": [
"%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Order of execution of SQL\n",
"\n",
"Let us review the order of execution of SQL. First let us review the order of writing the query.\n",
"\n",
"1. **SELECT**\n",
"2. **FROM**\n",
"3. **JOIN** or **OUTER JOIN** with **ON**\n",
"4. **WHERE**\n",
"5. **GROUP BY** and optionally **HAVING**\n",
"6. **ORDER BY**\n",
"\n",
"Let us come up with a query which will compute daily revenue using COMPLETE or CLOSED orders and also sorted by order_date."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db\n",
"10 rows affected.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" order_date \n",
" revenue \n",
" \n",
" \n",
" 2013-07-25 00:00:00 \n",
" 31547.23 \n",
" \n",
" \n",
" 2013-07-26 00:00:00 \n",
" 54713.23 \n",
" \n",
" \n",
" 2013-07-27 00:00:00 \n",
" 48411.48 \n",
" \n",
" \n",
" 2013-07-28 00:00:00 \n",
" 35672.03 \n",
" \n",
" \n",
" 2013-07-29 00:00:00 \n",
" 54579.70 \n",
" \n",
" \n",
" 2013-07-30 00:00:00 \n",
" 49329.29 \n",
" \n",
" \n",
" 2013-07-31 00:00:00 \n",
" 59212.49 \n",
" \n",
" \n",
" 2013-08-01 00:00:00 \n",
" 49160.08 \n",
" \n",
" \n",
" 2013-08-02 00:00:00 \n",
" 50688.58 \n",
" \n",
" \n",
" 2013-08-03 00:00:00 \n",
" 43416.74 \n",
" \n",
"
"
],
"text/plain": [
"[(datetime.datetime(2013, 7, 25, 0, 0), Decimal('31547.23')),\n",
" (datetime.datetime(2013, 7, 26, 0, 0), Decimal('54713.23')),\n",
" (datetime.datetime(2013, 7, 27, 0, 0), Decimal('48411.48')),\n",
" (datetime.datetime(2013, 7, 28, 0, 0), Decimal('35672.03')),\n",
" (datetime.datetime(2013, 7, 29, 0, 0), Decimal('54579.70')),\n",
" (datetime.datetime(2013, 7, 30, 0, 0), Decimal('49329.29')),\n",
" (datetime.datetime(2013, 7, 31, 0, 0), Decimal('59212.49')),\n",
" (datetime.datetime(2013, 8, 1, 0, 0), Decimal('49160.08')),\n",
" (datetime.datetime(2013, 8, 2, 0, 0), Decimal('50688.58')),\n",
" (datetime.datetime(2013, 8, 3, 0, 0), Decimal('43416.74'))]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"\n",
"SELECT o.order_date,\n",
" round(sum(oi.order_item_subtotal)::numeric, 2) AS revenue\n",
"FROM orders o JOIN order_items oi\n",
"ON o.order_id = oi.order_item_order_id\n",
"WHERE o.order_status IN ('COMPLETE', 'CLOSED')\n",
"GROUP BY o.order_date\n",
"ORDER BY o.order_date\n",
"LIMIT 10"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db\n",
"10 rows affected.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" order_date \n",
" revenue \n",
" \n",
" \n",
" 2013-07-26 00:00:00 \n",
" 54713.23 \n",
" \n",
" \n",
" 2013-07-29 00:00:00 \n",
" 54579.70 \n",
" \n",
" \n",
" 2013-07-31 00:00:00 \n",
" 59212.49 \n",
" \n",
" \n",
" 2013-08-02 00:00:00 \n",
" 50688.58 \n",
" \n",
" \n",
" 2013-08-06 00:00:00 \n",
" 57843.89 \n",
" \n",
" \n",
" 2013-08-12 00:00:00 \n",
" 59014.74 \n",
" \n",
" \n",
" 2013-08-17 00:00:00 \n",
" 63226.83 \n",
" \n",
" \n",
" 2013-08-24 00:00:00 \n",
" 52650.15 \n",
" \n",
" \n",
" 2013-09-05 00:00:00 \n",
" 59942.43 \n",
" \n",
" \n",
" 2013-09-06 00:00:00 \n",
" 61976.10 \n",
" \n",
"
"
],
"text/plain": [
"[(datetime.datetime(2013, 7, 26, 0, 0), Decimal('54713.23')),\n",
" (datetime.datetime(2013, 7, 29, 0, 0), Decimal('54579.70')),\n",
" (datetime.datetime(2013, 7, 31, 0, 0), Decimal('59212.49')),\n",
" (datetime.datetime(2013, 8, 2, 0, 0), Decimal('50688.58')),\n",
" (datetime.datetime(2013, 8, 6, 0, 0), Decimal('57843.89')),\n",
" (datetime.datetime(2013, 8, 12, 0, 0), Decimal('59014.74')),\n",
" (datetime.datetime(2013, 8, 17, 0, 0), Decimal('63226.83')),\n",
" (datetime.datetime(2013, 8, 24, 0, 0), Decimal('52650.15')),\n",
" (datetime.datetime(2013, 9, 5, 0, 0), Decimal('59942.43')),\n",
" (datetime.datetime(2013, 9, 6, 0, 0), Decimal('61976.10'))]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"\n",
"SELECT o.order_date,\n",
" round(sum(oi.order_item_subtotal)::numeric, 2) AS revenue\n",
"FROM orders o JOIN order_items oi\n",
"ON o.order_id = oi.order_item_order_id\n",
"WHERE o.order_status IN ('COMPLETE', 'CLOSED')\n",
"GROUP BY o.order_date\n",
" HAVING round(sum(oi.order_item_subtotal)::numeric, 2) >= 50000\n",
"ORDER BY order_date\n",
"LIMIT 10"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"However order of execution is typically as follows.\n",
"\n",
"1. **FROM**\n",
"2. **JOIN** or **OUTER JOIN** with **ON**\n",
"3. **WHERE**\n",
"4. **GROUP BY** and optionally **HAVING**\n",
"5. **SELECT**\n",
"6. **ORDER BY**\n",
"\n",
"As **SELECT** is executed before **ORDER BY** clause, we will not be able to refer the aliases defined in **SELECT** caluse in other clauses except for **ORDER BY** in most of the traditional databases including Postgresql."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```{error}\n",
"This will fail as revenue which is an alias defined in **SELECT** cannot be used in **WHERE**.\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db\n",
"(psycopg2.errors.UndefinedColumn) column \"revenue\" does not exist\n",
"LINE 5: AND revenue >= 50000\n",
" ^\n",
"\n",
"[SQL: SELECT o.order_date, round(sum(oi.order_item_subtotal)::numeric, 2) AS revenue\n",
"FROM orders o JOIN order_items oi\n",
"ON o.order_id = oi.order_item_order_id\n",
"WHERE o.order_status IN ('COMPLETE', 'CLOSED')\n",
" AND revenue >= 50000\n",
"GROUP BY o.order_date\n",
"ORDER BY order_date\n",
"LIMIT 10]\n",
"(Background on this error at: http://sqlalche.me/e/13/f405)\n"
]
}
],
"source": [
"%%sql\n",
"\n",
"SELECT o.order_date,\n",
" round(sum(oi.order_item_subtotal)::numeric, 2) AS revenue\n",
"FROM orders o JOIN order_items oi\n",
"ON o.order_id = oi.order_item_order_id\n",
"WHERE o.order_status IN ('COMPLETE', 'CLOSED')\n",
" AND revenue >= 50000\n",
"GROUP BY o.order_date\n",
"ORDER BY order_date\n",
"LIMIT 10"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```{note}\n",
"This will also fail as we cannot use aggregate functions in `WHERE` clause.\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db\n",
"(psycopg2.errors.GroupingError) aggregate functions are not allowed in WHERE\n",
"LINE 5: AND round(sum(oi.order_item_subtotal)::numeric, 2) >= 50...\n",
" ^\n",
"\n",
"[SQL: SELECT o.order_date, round(sum(oi.order_item_subtotal)::numeric, 2) AS revenue\n",
"FROM orders o JOIN order_items oi\n",
"ON o.order_id = oi.order_item_order_id\n",
"WHERE o.order_status IN ('COMPLETE', 'CLOSED')\n",
" AND round(sum(oi.order_item_subtotal)::numeric, 2) >= 50000\n",
"GROUP BY o.order_date\n",
"ORDER BY order_date\n",
"LIMIT 10]\n",
"(Background on this error at: http://sqlalche.me/e/13/f405)\n"
]
}
],
"source": [
"%%sql\n",
"\n",
"SELECT o.order_date,\n",
" round(sum(oi.order_item_subtotal)::numeric, 2) AS revenue\n",
"FROM orders o JOIN order_items oi\n",
"ON o.order_id = oi.order_item_order_id\n",
"WHERE o.order_status IN ('COMPLETE', 'CLOSED')\n",
" AND round(sum(oi.order_item_subtotal)::numeric, 2) >= 50000\n",
"GROUP BY o.order_date\n",
"ORDER BY order_date\n",
"LIMIT 10"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db\n",
"10 rows affected.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" order_date \n",
" revenue \n",
" \n",
" \n",
" 2013-07-25 00:00:00 \n",
" 31547.23 \n",
" \n",
" \n",
" 2013-07-26 00:00:00 \n",
" 54713.23 \n",
" \n",
" \n",
" 2013-07-27 00:00:00 \n",
" 48411.48 \n",
" \n",
" \n",
" 2013-07-28 00:00:00 \n",
" 35672.03 \n",
" \n",
" \n",
" 2013-07-29 00:00:00 \n",
" 54579.70 \n",
" \n",
" \n",
" 2013-07-30 00:00:00 \n",
" 49329.29 \n",
" \n",
" \n",
" 2013-07-31 00:00:00 \n",
" 59212.49 \n",
" \n",
" \n",
" 2013-08-01 00:00:00 \n",
" 49160.08 \n",
" \n",
" \n",
" 2013-08-02 00:00:00 \n",
" 50688.58 \n",
" \n",
" \n",
" 2013-08-03 00:00:00 \n",
" 43416.74 \n",
" \n",
"
"
],
"text/plain": [
"[(datetime.datetime(2013, 7, 25, 0, 0), Decimal('31547.23')),\n",
" (datetime.datetime(2013, 7, 26, 0, 0), Decimal('54713.23')),\n",
" (datetime.datetime(2013, 7, 27, 0, 0), Decimal('48411.48')),\n",
" (datetime.datetime(2013, 7, 28, 0, 0), Decimal('35672.03')),\n",
" (datetime.datetime(2013, 7, 29, 0, 0), Decimal('54579.70')),\n",
" (datetime.datetime(2013, 7, 30, 0, 0), Decimal('49329.29')),\n",
" (datetime.datetime(2013, 7, 31, 0, 0), Decimal('59212.49')),\n",
" (datetime.datetime(2013, 8, 1, 0, 0), Decimal('49160.08')),\n",
" (datetime.datetime(2013, 8, 2, 0, 0), Decimal('50688.58')),\n",
" (datetime.datetime(2013, 8, 3, 0, 0), Decimal('43416.74'))]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"\n",
"SELECT o.order_date,\n",
" round(sum(oi.order_item_subtotal)::numeric, 2) AS revenue\n",
"FROM orders o JOIN order_items oi\n",
"ON o.order_id = oi.order_item_order_id\n",
"WHERE o.order_status IN ('COMPLETE', 'CLOSED')\n",
"GROUP BY o.order_date\n",
"ORDER BY order_date,\n",
" revenue DESC\n",
"LIMIT 10"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db\n",
"10 rows affected.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" order_date \n",
" revenue \n",
" \n",
" \n",
" 2013-07-26 00:00:00 \n",
" 54713.23 \n",
" \n",
" \n",
" 2013-07-29 00:00:00 \n",
" 54579.70 \n",
" \n",
" \n",
" 2013-07-31 00:00:00 \n",
" 59212.49 \n",
" \n",
" \n",
" 2013-08-02 00:00:00 \n",
" 50688.58 \n",
" \n",
" \n",
" 2013-08-06 00:00:00 \n",
" 57843.89 \n",
" \n",
" \n",
" 2013-08-12 00:00:00 \n",
" 59014.74 \n",
" \n",
" \n",
" 2013-08-17 00:00:00 \n",
" 63226.83 \n",
" \n",
" \n",
" 2013-08-24 00:00:00 \n",
" 52650.15 \n",
" \n",
" \n",
" 2013-09-05 00:00:00 \n",
" 59942.43 \n",
" \n",
" \n",
" 2013-09-06 00:00:00 \n",
" 61976.10 \n",
" \n",
"
"
],
"text/plain": [
"[(datetime.datetime(2013, 7, 26, 0, 0), Decimal('54713.23')),\n",
" (datetime.datetime(2013, 7, 29, 0, 0), Decimal('54579.70')),\n",
" (datetime.datetime(2013, 7, 31, 0, 0), Decimal('59212.49')),\n",
" (datetime.datetime(2013, 8, 2, 0, 0), Decimal('50688.58')),\n",
" (datetime.datetime(2013, 8, 6, 0, 0), Decimal('57843.89')),\n",
" (datetime.datetime(2013, 8, 12, 0, 0), Decimal('59014.74')),\n",
" (datetime.datetime(2013, 8, 17, 0, 0), Decimal('63226.83')),\n",
" (datetime.datetime(2013, 8, 24, 0, 0), Decimal('52650.15')),\n",
" (datetime.datetime(2013, 9, 5, 0, 0), Decimal('59942.43')),\n",
" (datetime.datetime(2013, 9, 6, 0, 0), Decimal('61976.10'))]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"\n",
"SELECT o.order_date,\n",
" round(sum(oi.order_item_subtotal)::numeric, 2) AS revenue\n",
"FROM orders o JOIN order_items oi\n",
"ON o.order_id = oi.order_item_order_id\n",
"WHERE o.order_status IN ('COMPLETE', 'CLOSED')\n",
"GROUP BY o.order_date\n",
" HAVING round(sum(oi.order_item_subtotal)::numeric, 2) >= 50000\n",
"ORDER BY order_date\n",
"LIMIT 10"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```{error}\n",
"This one will also fail as we are trying to use alias `drnk` from `SELECT` clause in `WHERE` clause.\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db\n",
"(psycopg2.errors.UndefinedColumn) column \"drnk\" does not exist\n",
"LINE 6: WHERE drnk <= 5\n",
" ^\n",
"\n",
"[SQL: SELECT t.*, dense_rank() OVER (\n",
" PARTITION BY order_date\n",
" ORDER BY revenue DESC\n",
") AS drnk\n",
"FROM daily_product_revenue t\n",
"WHERE drnk <= 5]\n",
"(Background on this error at: http://sqlalche.me/e/13/f405)\n"
]
}
],
"source": [
"%%sql\n",
"\n",
"\n",
"SELECT t.*,\n",
"dense_rank() OVER (\n",
" PARTITION BY order_date\n",
" ORDER BY revenue DESC\n",
") AS drnk\n",
"FROM daily_product_revenue t\n",
"WHERE drnk <= 5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Overview of Sub Queries\n",
"\n",
"Let us recap about Sub Queries.\n",
"\n",
"* We typically have Sub Queries in **FROM** Clause.\n",
"* We need to provide alias to the Sub Queries in **FROM** Clause in Postgresql.\n",
"* We use sub queries quite often over queries using Analytics/Windowing Functions"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db\n",
"1 rows affected.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" current_date \n",
" \n",
" \n",
" 2020-12-01 \n",
" \n",
"
"
],
"text/plain": [
"[(datetime.date(2020, 12, 1),)]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"\n",
"SELECT * FROM (SELECT current_date) AS q"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let us see few more examples with respect to Sub Queries."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db\n",
"10 rows affected.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" order_date \n",
" order_count \n",
" \n",
" \n",
" 2013-07-25 00:00:00 \n",
" 143 \n",
" \n",
" \n",
" 2013-07-26 00:00:00 \n",
" 269 \n",
" \n",
" \n",
" 2013-07-27 00:00:00 \n",
" 202 \n",
" \n",
" \n",
" 2013-07-28 00:00:00 \n",
" 187 \n",
" \n",
" \n",
" 2013-07-29 00:00:00 \n",
" 253 \n",
" \n",
" \n",
" 2013-07-30 00:00:00 \n",
" 227 \n",
" \n",
" \n",
" 2013-07-31 00:00:00 \n",
" 252 \n",
" \n",
" \n",
" 2013-08-01 00:00:00 \n",
" 246 \n",
" \n",
" \n",
" 2013-08-02 00:00:00 \n",
" 224 \n",
" \n",
" \n",
" 2013-08-03 00:00:00 \n",
" 183 \n",
" \n",
"
"
],
"text/plain": [
"[(datetime.datetime(2013, 7, 25, 0, 0), 143),\n",
" (datetime.datetime(2013, 7, 26, 0, 0), 269),\n",
" (datetime.datetime(2013, 7, 27, 0, 0), 202),\n",
" (datetime.datetime(2013, 7, 28, 0, 0), 187),\n",
" (datetime.datetime(2013, 7, 29, 0, 0), 253),\n",
" (datetime.datetime(2013, 7, 30, 0, 0), 227),\n",
" (datetime.datetime(2013, 7, 31, 0, 0), 252),\n",
" (datetime.datetime(2013, 8, 1, 0, 0), 246),\n",
" (datetime.datetime(2013, 8, 2, 0, 0), 224),\n",
" (datetime.datetime(2013, 8, 3, 0, 0), 183)]"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"\n",
"SELECT * FROM (\n",
" SELECT order_date, count(1) AS order_count\n",
" FROM orders\n",
" GROUP BY order_date\n",
") AS q\n",
"ORDER BY order_date\n",
"LIMIT 10"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db\n",
"10 rows affected.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" order_date \n",
" order_count \n",
" \n",
" \n",
" 2013-07-26 00:00:00 \n",
" 269 \n",
" \n",
" \n",
" 2013-07-27 00:00:00 \n",
" 202 \n",
" \n",
" \n",
" 2013-07-28 00:00:00 \n",
" 187 \n",
" \n",
" \n",
" 2013-07-29 00:00:00 \n",
" 253 \n",
" \n",
" \n",
" 2013-07-30 00:00:00 \n",
" 227 \n",
" \n",
" \n",
" 2013-07-31 00:00:00 \n",
" 252 \n",
" \n",
" \n",
" 2013-08-01 00:00:00 \n",
" 246 \n",
" \n",
" \n",
" 2013-08-02 00:00:00 \n",
" 224 \n",
" \n",
" \n",
" 2013-08-03 00:00:00 \n",
" 183 \n",
" \n",
" \n",
" 2013-08-04 00:00:00 \n",
" 187 \n",
" \n",
"
"
],
"text/plain": [
"[(datetime.datetime(2013, 7, 26, 0, 0), 269),\n",
" (datetime.datetime(2013, 7, 27, 0, 0), 202),\n",
" (datetime.datetime(2013, 7, 28, 0, 0), 187),\n",
" (datetime.datetime(2013, 7, 29, 0, 0), 253),\n",
" (datetime.datetime(2013, 7, 30, 0, 0), 227),\n",
" (datetime.datetime(2013, 7, 31, 0, 0), 252),\n",
" (datetime.datetime(2013, 8, 1, 0, 0), 246),\n",
" (datetime.datetime(2013, 8, 2, 0, 0), 224),\n",
" (datetime.datetime(2013, 8, 3, 0, 0), 183),\n",
" (datetime.datetime(2013, 8, 4, 0, 0), 187)]"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"\n",
"SELECT * FROM (\n",
" SELECT order_date, count(1) AS order_count\n",
" FROM orders\n",
" GROUP BY order_date\n",
") q\n",
"WHERE q.order_count > 150\n",
"ORDER BY order_date\n",
"LIMIT 10"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```{note}\n",
"Above query is an example for sub queries. We can achieve using HAVING clause (no need to have sub query to filter)\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Filtering - Analytic Function Results\n",
"\n",
"Let us understand how to filter on top of results of Analytic Functions.\n",
"\n",
"* We can use Analytic Functions only in **SELECT** Clause.\n",
"* If we have to filter based on Analytic Function results, then we need to use Sub Queries.\n",
"* Once the query is added as subquery, we can apply filter using aliases of the Analytic Functions.\n",
"\n",
"Here is the example where we can filter data based on Analytic Functions."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db\n",
"(psycopg2.errors.UndefinedColumn) column \"drnk\" does not exist\n",
"LINE 6: WHERE drnk <= 5\n",
" ^\n",
"\n",
"[SQL: SELECT t.*, dense_rank() OVER (\n",
" PARTITION BY order_date\n",
" ORDER BY revenue DESC\n",
") AS drnk\n",
"FROM daily_product_revenue t\n",
"WHERE drnk <= 5]\n",
"(Background on this error at: http://sqlalche.me/e/13/f405)\n"
]
}
],
"source": [
"%%sql\n",
"\n",
"SELECT t.*,\n",
"dense_rank() OVER (\n",
" PARTITION BY order_date\n",
" ORDER BY revenue DESC\n",
") AS drnk\n",
"FROM daily_product_revenue t\n",
"WHERE drnk <= 5"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db\n",
"10 rows affected.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" order_date \n",
" order_item_product_id \n",
" revenue \n",
" drnk \n",
" \n",
" \n",
" 2013-07-25 00:00:00 \n",
" 1004 \n",
" 5599.72 \n",
" 1 \n",
" \n",
" \n",
" 2013-07-25 00:00:00 \n",
" 191 \n",
" 5099.49 \n",
" 2 \n",
" \n",
" \n",
" 2013-07-25 00:00:00 \n",
" 957 \n",
" 4499.70 \n",
" 3 \n",
" \n",
" \n",
" 2013-07-25 00:00:00 \n",
" 365 \n",
" 3359.44 \n",
" 4 \n",
" \n",
" \n",
" 2013-07-25 00:00:00 \n",
" 1073 \n",
" 2999.85 \n",
" 5 \n",
" \n",
" \n",
" 2013-07-26 00:00:00 \n",
" 1004 \n",
" 10799.46 \n",
" 1 \n",
" \n",
" \n",
" 2013-07-26 00:00:00 \n",
" 365 \n",
" 7978.67 \n",
" 2 \n",
" \n",
" \n",
" 2013-07-26 00:00:00 \n",
" 957 \n",
" 6899.54 \n",
" 3 \n",
" \n",
" \n",
" 2013-07-26 00:00:00 \n",
" 191 \n",
" 6799.32 \n",
" 4 \n",
" \n",
" \n",
" 2013-07-26 00:00:00 \n",
" 1014 \n",
" 4798.08 \n",
" 5 \n",
" \n",
"
"
],
"text/plain": [
"[(datetime.datetime(2013, 7, 25, 0, 0), 1004, Decimal('5599.72'), 1),\n",
" (datetime.datetime(2013, 7, 25, 0, 0), 191, Decimal('5099.49'), 2),\n",
" (datetime.datetime(2013, 7, 25, 0, 0), 957, Decimal('4499.70'), 3),\n",
" (datetime.datetime(2013, 7, 25, 0, 0), 365, Decimal('3359.44'), 4),\n",
" (datetime.datetime(2013, 7, 25, 0, 0), 1073, Decimal('2999.85'), 5),\n",
" (datetime.datetime(2013, 7, 26, 0, 0), 1004, Decimal('10799.46'), 1),\n",
" (datetime.datetime(2013, 7, 26, 0, 0), 365, Decimal('7978.67'), 2),\n",
" (datetime.datetime(2013, 7, 26, 0, 0), 957, Decimal('6899.54'), 3),\n",
" (datetime.datetime(2013, 7, 26, 0, 0), 191, Decimal('6799.32'), 4),\n",
" (datetime.datetime(2013, 7, 26, 0, 0), 1014, Decimal('4798.08'), 5)]"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"\n",
"SELECT * FROM (\n",
" SELECT t.*,\n",
" dense_rank() OVER (\n",
" PARTITION BY order_date\n",
" ORDER BY revenue DESC\n",
" ) AS drnk\n",
" FROM daily_product_revenue t\n",
") q\n",
"WHERE q.drnk <= 5\n",
"ORDER BY q.order_date, q.revenue DESC\n",
"LIMIT 10"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "",
"name": ""
},
"language_info": {
"name": ""
}
},
"nbformat": 4,
"nbformat_minor": 4
}