{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Extending Ibis Part 2: Adding a New Reduction Expression"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook will show you how to add a new *reduction* operation `last_date` to the existing backend SQLite.\n",
    "\n",
    "A reduction operation is a function that maps $N$ rows to 1 row, for example the `sum` function."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Description\n",
    "\n",
    "We're going to add a **`last_date`** function to ibis. `last_date` simply returns the latest date of a list of dates."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1: Define the Operation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's define the `last_date` operation as a function that takes any date column as input and returns a date:\n",
    "\n",
    "```python\n",
    "import datetime\n",
    "import typing\n",
    "\n",
    "def last_date(dates: typing.List[datetime.date]) -> datetime.date:\n",
    "    \"\"\"Latest date\"\"\"\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import ibis.expr.datatypes as dt\n",
    "import ibis.expr.rules as rlz\n",
    "\n",
    "from ibis.expr.operations import Reduction, Arg\n",
    "\n",
    "\n",
    "class LastDate(Reduction):\n",
    "    arg = Arg(rlz.column(rlz.date))\n",
    "    where = Arg(rlz.boolean, default=None)\n",
    "    output_type = rlz.scalar_like('arg')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We just defined a `LastDate` class that takes one date column as input, and returns a scalar output of the same type as the input. This matches both the requirements of a reduction and the spepcifics of the function that we want to implement.\n",
    "\n",
    "**Note**: It is very important that you write the correct argument rules and output type here. The expression *will not work* otherwise."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2: Define the API"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Because every reduction in ibis has the ability to filter out values during aggregation (a typical feature in databases and analytics tools), to make an expression out of ``LastDate`` we need to pass an additional argument: `where` to our `LastDate` constructor."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from ibis.expr.types import DateColumn  # not DateValue! reductions are only valid on columns\n",
    "\n",
    "\n",
    "def last_date(date_column, where=None):\n",
    "    return LastDate(date_column, where=where).to_expr()\n",
    "\n",
    "\n",
    "DateColumn.last_date = last_date"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Interlude: Create some expressions using `last_date`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import ibis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "people = ibis.table([('name', 'string'), ('country', 'string'), ('date_of_birth', 'date')], name='people')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "people.date_of_birth.last_date()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "people.date_of_birth.last_date(people.country == 'Indonesia')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 3: Turn the Expression into SQL"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sqlalchemy as sa\n",
    "\n",
    "\n",
    "@ibis.sqlite.add_operation(LastDate)\n",
    "def _last_date(translator, expr):\n",
    "    # pull out the arguments to the expression\n",
    "    arg, where = expr.op().args\n",
    "    \n",
    "    # compile the argument\n",
    "    compiled_arg = translator.translate(arg)\n",
    "    \n",
    "    # call the appropriate SQLite function (`max` for the latest/maximum date)\n",
    "    agg = sa.func.max(compiled_arg)\n",
    "    \n",
    "    # handle a non-None filter clause\n",
    "    if where is not None:\n",
    "        return agg.filter(translator.translate(where))\n",
    "    return agg"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 4: Putting it all Together"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import pathlib\n",
    "import ibis\n",
    "\n",
    "db_fname = str(pathlib.Path().resolve().parent.parent / 'tutorial' / 'data' / 'geography.db')\n",
    "\n",
    "con = ibis.sqlite.connect(db_fname)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create and execute a `bitwise_and` expression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "independence = con.table('independence')\n",
    "independence"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Last country to gain independence in our database:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "expr = independence.independence_date.last_date()\n",
    "expr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sql_expr = expr.compile()\n",
    "print(sql_expr)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "expr.execute()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Last country to gain independence from the Spanish Empire, using the `where` parameter:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "expr = independence.independence_date.last_date(where=independence.independence_from == 'Spanish Empire')\n",
    "expr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = expr.execute()\n",
    "result"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
