Metadata-Version: 2.1
Name: flyvec
Version: 0.0.9
Summary: A biologically inspired method to create sparse, binary word vectors
Home-page: https://github.com/bhoov/flyvec/tree/master/
Author: Benjamin Hoover
Author-email: benhoover34@gmail.com
License: Apache Software License 2.0
Description: # FlyVec
        > Flybrain-inspired Sparse Binary Word Embeddings
        
        
        Code based on the ICLR 2021 paper [Can a Fruit Fly Learn Word Embeddings?](https://openreview.net/forum?id=xfmSoxdxFCG ). A work in progress.
        
        ## Install
        
        `pip install flyvec`
        
        ## How to use
        
        ```
        import numpy as np
        from flyvec import FlyVec
        
        model = FlyVec.load()
        embed_info = model.get_sparse_embedding("market")
        ```
        
            Loading Tokenizer...
            No phraser specified. Proceeding without phrases
            Loading synapses...
        
        
        FlyVec uses a simple, word-based tokenizer with to isolate concepts. The provided model uses a tokenizer with about 40,000 words, all lower-cased, with special tokens for numbers (`<NUM>`) and unknown words (`<UNK>`). See `Tokenizer` for details.
        
        ```
        # Batch generate word embeddings
        sentence = "Supreme Court dismissed the criminal charges."
        tokens = model.tokenize(sentence)
        embedding_info = [model.get_sparse_embedding(t) for t in tokens]
        embeddings = np.array([e['embedding'] for e in embedding_info])
        print("TOKENS: ", [e['token'] for e in embedding_info])
        print("EMBEDDINGS: ", embeddings)
        ```
        
            TOKENS:  ['supreme', 'court', 'dismissed', 'the', 'criminal', 'charges']
            EMBEDDINGS:  [[0 1 0 ... 0 0 0]
             [0 0 0 ... 0 0 0]
             [0 0 0 ... 0 1 0]
             [0 0 0 ... 0 0 0]
             [0 0 0 ... 0 1 0]
             [0 0 0 ... 0 1 0]]
        
        
Keywords: GloVE Word2Vec Wordvector NLP Bioinspired AI ML sparse binary
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.6
Description-Content-Type: text/markdown
