Metadata-Version: 2.1
Name: pansql
Version: 0.0.1rc1
Summary: sqldf for pandas
Home-page: https://github.com/hrshdhgd/pansql/
Author: Harshad Hegde
Author-email: hhegde@lbl.gov
License: MIT
Classifier: License :: OSI Approved :: MIT License
License-File: LICENSE.txt
License-File: AUTHORS.md

**DISCLAIMER**
==============

This project is not maintained. It is merely a fork of
```yhat/pandasql`` <https://github.com/yhat/pandasql>`__ and all credit
goes to the group. This fork just resolves an issue of compatibility
with SQLAlchemy v2.x.x. `A PR was
requested <https://github.com/yhat/pandasql/pull/104>`__ for this to be
included in the main ``pandasql`` project but it seems to be dormant.
This sparked the creation of this fork.

pansql
======

``pansql`` allows you to query ``pandas`` DataFrames using SQL syntax.
It works similarly to ``sqldf`` in R. ``pansql`` seeks to provide a more
familiar way of manipulating and cleaning data for people new to Python
or ``pandas``.

Installation
^^^^^^^^^^^^

::

   $ pip install -U pansql

Basics
^^^^^^

The main function used in pansql is ``sqldf``. ``sqldf`` accepts 2
parametrs - a sql query string - a set of session/environment variables
(``locals()`` or ``globals()``)

Specifying ``locals()`` or ``globals()`` can get tedious. You can define
a short helper function to fix this.

::

   from pansql import sqldf
   pysqldf = lambda q: sqldf(q, globals())

Querying
^^^^^^^^

``pansql`` uses `SQLite syntax <http://www.sqlite.org/lang.html>`__. Any
``pandas`` dataframes will be automatically detected by ``pansql``. You
can query them as you would any regular SQL table.

::

   $ python
   >>> from pansql import sqldf, load_meat, load_births
   >>> pysqldf = lambda q: sqldf(q, globals())
   >>> meat = load_meat()
   >>> births = load_births()
   >>> print pysqldf("SELECT * FROM meat LIMIT 10;").head()
                     date  beef  veal  pork  lamb_and_mutton broilers other_chicken turkey
   0  1944-01-01 00:00:00   751    85  1280               89     None          None   None
   1  1944-02-01 00:00:00   713    77  1169               72     None          None   None
   2  1944-03-01 00:00:00   741    90  1128               75     None          None   None
   3  1944-04-01 00:00:00   650    89   978               66     None          None   None
   4  1944-05-01 00:00:00   681   106  1029               78     None          None   None

joins and aggregations are also supported

::

   >>> q = """SELECT
           m.date, m.beef, b.births
        FROM
           meats m
        INNER JOIN
           births b
              ON m.date = b.date;"""
   >>> joined = pyqldf(q)
   >>> print joined.head()
                       date    beef  births
   403  2012-07-01 00:00:00  2200.8  368450
   404  2012-08-01 00:00:00  2367.5  359554
   405  2012-09-01 00:00:00  2016.0  361922
   406  2012-10-01 00:00:00  2343.7  347625
   407  2012-11-01 00:00:00  2206.6  320195

   >>> q = "select
              strftime('%Y', date) as year
              , SUM(beef) as beef_total
              FROM
                 meat
              GROUP BY
                 year;"
   >>> print pysqldf(q).head()
      year  beef_total
   0  1944        8801
   1  1945        9936
   2  1946        9010
   3  1947       10096
   4  1948        8766

More information and code samples available in the
`examples <https://github.com/yhat/pandasql/blob/master/examples/demo.py>`__
folder or on `our
blog <http://blog.yhathq.com/posts/pandasql-sql-for-pandas-dataframes.html>`__.

|Analytics|

.. |Analytics| image:: https://ga-beacon.appspot.com/UA-46996803-1/pandasql/README.md
   :target: https://github.com/yhat/pandasql
