Metadata-Version: 2.1
Name: antarctic
Version: 0.0.3
Summary: Storing Pandas Data in a MongoDB database
Home-page: https://github.com/tschm/antarctic
Author: Thomas Schmelzer
Author-email: thomas.schmelzer@gmail.com
License: MIT
Description: # Antarctic
        [![CI](https://github.com/tschm/antarctic/workflows/CI/badge.svg)](https://github.com/tschm/antarctic/actions/)
        [![Release](https://github.com/tschm/antarctic/workflows/Release/badge.svg)](https://github.com/tschm/antarctic/actions/)
        
        Project to persist Pandas data structures in a MongoDB database. 
        
        ## Installation
        ```python
        pip install antarctic
        ```
        
        ###  Usage
        This project (unless the popular arctic project which I admire) is based on top of MongoEngine, see https://pypi.org/project/mongoengine/
        MongoEngine is an ORM for MongoDB. MongoDB stores documents. 
        We introduce here two new fields --- one for a Pandas Series and one for a Pandas DataFrame.
        
        ```python
        from mongoengine import Document, connect
        from antarctic.PandasFields import SeriesField, FrameField
        
        # connect with your existing MongoDB (here I am using a popular interface mocking a MongoDB)
        client = connect(db="test", host="mongomock://localhost")
        
        # Define the blueprint for a portfolio document
        class Portfolio(Document):
            nav = SeriesField()
            prices = FrameField()
        ```
        
        The portfolio objects works exactly the way you think it works
        
        ```python
        
        p = Portfolio()
        p.nav = pd.Series(...)
        p.prices = pd.DataFrame(...)
        p.save()
        
        print(p.nav)
        print(p.prices)
        ```
        
        Behind the scenes we convert the both Series and Frame objects into json documents and
        store them in the MongoDB database.
        
        We don't apply any clever conversion into compressed bytestreams. Performance is not our main concern here.
        
        ### Database?
        
        Storing json or bytestream representations of Pandas objects is not exactly a database. Appending is rather expensive as one would have
        to extract the original Pandas object, append to it and convert the new object back into a json or bytestream representation.
        Clever sharding can mitigate such effects but at the end of the day you shouldn't update such objects too often. Often practitioners
        use a small database for recording (e.g. over the last 24h) and update the MongoDB database once a day. It's extremely fast to read the Pandas objects
        out of such a construction.
        
        Also note that one in theory could try to build this on top of pyarrow and support both R and Python. 
Platform: UNKNOWN
Description-Content-Type: text/markdown
