{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# CHAPTER 11 Time Series（时间序列）\n",
        "\n",
        "时间序列指能在任何能在时间上观测到的数据。很多时间序列是有固定频率（fixed frequency）的，意思是数据点会遵照某种规律定期出现，比如每 15 秒，每 5 分钟，或每个月。时间序列也可能是不规律的（irregular），没有一个固定的时间规律。如何参照时间序列数据取决于我们要做什么样的应用，我们可能会遇到下面这些：\n",
        "\n",
        "- Timestamps（时间戳），具体的某一个时刻\n",
        "- Fixed periods（固定的时期），比如 2007 年的一月，或者 2010 年整整一年\n",
        "- Intervals of time（时间间隔），通常有一个开始和结束的时间戳。Periods（时期）可能被看做是 Intervals（间隔）的一种特殊形式。\n",
        "- Experiment or elapsed time（实验或经过的时间）；每一个时间戳都是看做是一个特定的开始时间（例如，在放入烤箱后，曲奇饼的直径在每一秒的变化程度）\n",
        "\n",
        "这一章主要涉及前三个类型。\n",
        "\n",
        "> pandas 也支持基于 timedeltas 的 index，本书不会对 timedelta index 做介绍，感兴趣的可以查看 pandas 的文档。\n",
        "\n",
        "\n",
        "# 11.1 Date and Time Data Types and Tools（日期和时间数据类型及其工具）\n",
        "\n",
        "python 有标准包用来表示时间和日期数据。datetime, time, calendar，这些模块经常被使用。datetime.datetime 类型，或简单写为 datetime，被广泛使用：\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 27,
      "metadata": {
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "import pandas as pd"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "from datetime import datetime"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "now = datetime.now()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "datetime.datetime(2017, 12, 1, 12, 12, 0, 375896)"
            ]
          },
          "execution_count": 4,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "now"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "(2017, 12, 1)"
            ]
          },
          "execution_count": 5,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "now.year, now.month, now.day"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "datetime 能保存日期和时间到微妙级别。timedelta 表示两个不同的 datetime 对象之间的时间上的不同："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "datetime.timedelta(926, 56700)"
            ]
          },
          "execution_count": 6,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "delta = datetime(2011, 1, 7) - datetime(2008, 6, 24, 8, 15)\n",
        "delta"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "926"
            ]
          },
          "execution_count": 7,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "delta.days"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "56700"
            ]
          },
          "execution_count": 8,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "delta.seconds"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "我们可以在一个 datetime 对象上，添加或减少一个或多个 timedelta，这样可以产生新的变化后的对象："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "from datetime import timedelta"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "start = datetime(2011, 1, 7)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "datetime.datetime(2011, 1, 19, 0, 0)"
            ]
          },
          "execution_count": 11,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "start + timedelta(12)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "datetime.datetime(2010, 12, 14, 0, 0)"
            ]
          },
          "execution_count": 12,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "start - 2 * timedelta(12)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "下表汇总了一些 datetime 模块中的数据类型：\n",
        "\n",
        "![](http://oydgk2hgw.bkt.clouddn.com/pydata-book/wqo6m.png)\n",
        "\n",
        "# 1 Converting Between String and Datetime（字符串与时间的转换）\n",
        "\n",
        "我们可以对 datetime 对象，以及 pandas 的 Timestamp 对象进行格式化，这部分之后会介绍，使用 str 或 strftime 方法，传入一个特定的时间格式就能进行转换："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "metadata": {
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "stamp = datetime(2011, 1, 3)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "'2011-01-03 00:00:00'"
            ]
          },
          "execution_count": 14,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "str(stamp)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 15,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "'2011-01-03'"
            ]
          },
          "execution_count": 15,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "stamp.strftime('%Y-%m-%d')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "下表是关于日期时间类型的格式：\n",
        "\n",
        "![](http://oydgk2hgw.bkt.clouddn.com/pydata-book/r98dw.png)\n",
        "\n",
        "![](http://oydgk2hgw.bkt.clouddn.com/pydata-book/bc9e8.png)\n",
        "\n",
        "我们可以利用上面的 format codes（格式码；时间日期格式）把字符串转换为日期，这要用到 datetime.strptime:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 16,
      "metadata": {
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "value = '2011-01-03'"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 17,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "datetime.datetime(2011, 1, 3, 0, 0)"
            ]
          },
          "execution_count": 17,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "datetime.strptime(value, '%Y-%m-%d')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 18,
      "metadata": {
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "datestrs = ['7/6/2011', '8/6/2011']"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 19,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "[datetime.datetime(2011, 7, 6, 0, 0), datetime.datetime(2011, 8, 6, 0, 0)]"
            ]
          },
          "execution_count": 19,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "[datetime.strptime(x, '%m/%d/%Y') for x in datestrs]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "对于一个一直的时间格式，使用 datetime.strptime 来解析日期是很好的方法。但是，如果每次都要写格式的话很烦人，尤其是对于一些比较常见的格式。在这种情况下，我们可以使用第三方库 dateutil 中的 parser.parse 方法（这个库会在安装 pandas 的时候自动安装）："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 20,
      "metadata": {
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "from dateutil.parser import parse"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 21,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "datetime.datetime(2011, 1, 3, 0, 0)"
            ]
          },
          "execution_count": 21,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "parse('2011-01-03')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "dateutil 能够解析很多常见的时间表示格式："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 22,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "datetime.datetime(1997, 1, 31, 22, 45)"
            ]
          },
          "execution_count": 22,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "parse('Jan 31, 1997 10:45 PM')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "在国际上，日在月之前是很常见的（译者：美国是把月放在日前面的），所以我们可以设置 dayfirst=True 来指明最前面的是否是日："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 23,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "datetime.datetime(2011, 12, 6, 0, 0)"
            ]
          },
          "execution_count": 23,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "parse('6/12/2011', dayfirst=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "pandas 通常可以用于处理由日期组成的数组，不论是否是 DataFrame 中的行索引或列。to_datetime 方法能解析很多不同种类的日期表示。标准的日期格式，比如 ISO 8601，能被快速解析："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 24,
      "metadata": {
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "datestrs = ['2011-07-06 12:00:00', '2011-08-06 00:00:00']"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 28,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00'], dtype='datetime64[ns]', freq=None)"
            ]
          },
          "execution_count": 28,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "pd.to_datetime(datestrs)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "还能处理一些应该被判断为缺失的值（比如 None, 空字符串之类的）："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 29,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00', 'NaT'], dtype='datetime64[ns]', freq=None)"
            ]
          },
          "execution_count": 29,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "idx = pd.to_datetime(datestrs + [None])\n",
        "idx"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 30,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "NaT"
            ]
          },
          "execution_count": 30,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "idx[2]"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 31,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "array([False, False,  True], dtype=bool)"
            ]
          },
          "execution_count": 31,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "pd.isnull(idx)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Nat(Not a Time) 在 pandas 中，用于表示时间戳为空值（null value）。\n",
        "\n",
        "> dateutil.parse 是一个很有用但不完美的工具。它可能会把一些字符串识别为日期，例如，'42'就会被解析为 2042 年加上今天的日期。\n",
        "\n",
        "datetime 对象还有一些关于地区格式（locale-specific formatting）的选项，用于处理不同国家或不同语言的问题。例如，月份的缩写在德国和法国，与英语是不同的。下表列出一些相关的选项：\n",
        "\n",
        "![](http://oydgk2hgw.bkt.clouddn.com/pydata-book/gp2fy.png)"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.11.0 (main, Oct 26 2022, 19:06:18) [Clang 14.0.0 (clang-1400.0.29.202)]"
    },
    "vscode": {
      "interpreter": {
        "hash": "5c7b89af1651d0b8571dde13640ecdccf7d5a6204171d6ab33e7c296e100e08a"
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}