Metadata-Version: 2.4
Name: embedding-flow
Version: 0.1.4
Summary: Pipeline to transform text chunks into embeddings and load to Qdrant
Author: facuvega
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: sentence-transformers
Requires-Dist: transformers
Requires-Dist: pandas
Requires-Dist: pyarrow
Requires-Dist: qdrant-client>=1.7.0
Requires-Dist: python-dotenv
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Provides-Extra: cpu
Requires-Dist: torch>=2.0.0; extra == "cpu"
Provides-Extra: cuda
Requires-Dist: torch>=2.0.0; extra == "cuda"
Dynamic: license-file

# embedding-flow

Biblioteca para transformar chunks de texto en embeddings de 768 dimensiones y cargarlos en Qdrant.

## Instalación

```bash
# Instalar con torch CPU (recomendado, evita descargar CUDA)
pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install embedding-flow

# O instalar todo junto
pip install torch --index-url https://download.pytorch.org/whl/cpu && pip install embedding-flow
```

## Uso

```python
from embedding_flow import embedding_flow

# Recibe el path del parquet con chunks y carga embeddings a Qdrant
embedding_flow("/path/to/chunks.parquet")
```

## Variables de entorno

```bash
QDRANT_URL=http://localhost:6333
QDRANT_COLLECTION=embeddings_collection
VECTOR_SIZE=768
```

## Flujo

1. Lee chunks desde parquet
2. Genera embeddings (768 dim) con `all-mpnet-base-v2`
3. Carga embeddings a Qdrant (Docker local)

## Licencia

MIT

