Metadata-Version: 2.4
Name: llm-aggregator
Version: 0.1.11
Summary: A model aggregator service for multiple LLM backends.
Project-URL: Homepage, https://github.com/Wuodan/llm-aggregator
Project-URL: Repository, https://github.com/Wuodan/llm-aggregator
Project-URL: Issues, https://github.com/Wuodan/llm-aggregator/issues
License: Apache-2.0
License-File: LICENSE
Requires-Python: >=3.10
Requires-Dist: aiohttp
Requires-Dist: apscheduler
Requires-Dist: extract2md
Requires-Dist: fastapi
Requires-Dist: httpx
Requires-Dist: psutil
Requires-Dist: pydantic
Requires-Dist: pydantic-settings
Requires-Dist: pyyaml
Requires-Dist: uvicorn
Description-Content-Type: text/markdown

# LLM Aggregator

LLM Aggregator keeps a live list of every model exposed by your local OpenAI-compatible servers.

---

## Features

- Polls models from configured LLM provider servers (`/v1/models`).
- Enriches model information with a helper LLM.
- Optionally hands model information from external websites to helper LLM.
- Ships with a minimal UI showing providers, models, and host RAM.
- The builtin UI can easily be replaced.

---

## Web Interface

The builtin UI shows a single table plus a small RAM widget, so you immediately see what is running:

<!-- pyml disable line-length -->

| Model       | Base URL                     | Types     | Family    | Context | Quant  | Params | Summary                        |
|-------------|------------------------------|-----------|-----------|---------|--------|--------|--------------------------------|
| llama3.1:8b | `http://10.7.2.100:11434/v1` | llm       | Llama 3.1 | 8K      | Q4_K_M | 8B     | General chat tuned for balance |
| qwen2.5:14b | `http://10.7.2.100:8080/v1`  | llm,embed | Qwen 2.5  | 32K     | Q5_0   | 14B    | Multilingual reasoning focused |

<!-- pyml enable line-length -->

Columns:

- `Model` – identifier reported by the provider.
- `Base URL` – where the model is served.
- `Types` – capabilities (LLM, VLM, embedder, etc.).
- `Family` – base architecture inferred by the helper LLM.
- `Context` – approximate context window in tokens.
- `Quant` – quantization hinted by the model name or docs.
- `Params` – estimated parameter count.
- `Summary` – one-line description generated by the helper LLM.

---

## Installation

### Prerequisites

- Python 3.10 or higher
- LLM servers (Ollama, llama.cpp, nexa, etc.) with OpenAI-compatible APIs

### Install from PyPI

```bash
pip install llm-aggregator
```

---

## Usage

Set the `LLM_AGGREGATOR_CONFIG` environment variable to point at your [config.yaml](config.yaml) and the service will
load it on startup.

### Starting the Service

```bash
export LLM_AGGREGATOR_CONFIG=/path/to/config.yaml
llm-aggregator
```

Or run directly:

```bash
export LLM_AGGREGATOR_CONFIG=/path/to/config.yaml
python -m llm_aggregator
```

By default, the web interface will be available at `http://localhost:8888`.

---

## Configuration

All runtime behavior is controlled through the YAML file pointed to by the `LLM_AGGREGATOR_CONFIG` environment variable.
Use [config.yaml](config.yaml) as a reference template.

### UI modes

Use `static_enabled` and `custom_static_path` to set one of three modes:

- `static_enabled: true` (default) serves the built-in UI.
- `static_enabled: true` and `custom_static_path: /path/to/dir` serves your files instead of the built-in UI.
- `static_enabled: false` serves no UI at all. Provide your own UI using the REST endpoints.

### Configuration Options

- **host / port** – Where the FastAPI server and static frontend bind.
- **log_level** – Logging verbosity (`DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`). Defaults to `INFO` if omitted.
- **log_format** – Optional `logging` format string. When omitted the service leaves existing logging configuration
  untouched.
- **logger_overrides** – Map of logger names to override their logging level
  (e.g., `httpx: WARNING`).
- **brain** – Settings for the enrichment LLM:
  - `base_url` – HTTP endpoint of the enrichment provider.
  - `id` – Model identifier passed to the provider.
  - `api_key` – Optional API-Key.
  - `max_batch_size` – Number of models to enrich at once (defaults to 1).
  - `temperature` – Sampling temperature used for enrichment calls (default: `0.2`).
- **providers** – Map of provider name to an OpenAI-compatible backend to query:
  - `base_url` – Public URL returned via the REST API.
  - `internal_base_url` – Optional internal URL used for server-to-server calls; defaults to `base_url` when omitted.
  - `api_key` – Optional API-Key for that provider.
  - `files_size_gatherer` – Optional block to report on-disk model size:
    - `path` – Script or executable invoked as `<path> <base_path> <full_model_name>`.
    - `base_path` – Filesystem root passed to the script.
    - `timeout_seconds` – Optional per-provider timeout (default: 15s).
- **model_info_sources** – Optional external websites where model information is fetched from for enrichment.
  Each entry requires a human-readable `name` (shown to the LLM) and a `url_template` that contains `{model_id}`.
- **time** – Background scheduling knobs (all in seconds):
  - `fetch_models_interval`
  - `fetch_models_timeout`
  - `enrich_models_timeout`
  - `enrich_idle_sleep`
  - `website_markdown_cache_ttl` – TTL for cached markdown scraped from external sources.
- **ui** – Optional static UI:
  - `static_enabled` – `true`: static web frontend is served at `/index.html` and assets at `/static`.
  - `custom_static_path` – Optional directory to replace the bundled UI; must contain a readable `index.html` and
    asset files.
- **brain_prompts** – LLM instructions kept separate so the block can live at the end of the YAML:
  - `system` – System message injected ahead of every enrichment request.
  - `user` – Main user instruction describing the enrichment JSON contract.
  - `model_info_prefix_template` – Optional prefix template applied to fetched markdown snippets; receives `{model_id}`
    and `{provider_label}` placeholders.

---

## REST API

- `GET /v1/models` – OpenAI `ListModelsResponse` plus a `meta` object on each `data` item with the enriched
  metadata. Example:

  ```json
  {
    "object": "list",
    "data": [
      {
        "id": "llama3.1:8b",
        "object": "model",
        "created": 1,
        "owned_by": "ollama",
        "meta": {
          "base_url": "http://127.0.0.1:11434/v1",
          "types": ["llm"],
          "model_family": "Llama 3.1",
          "context_size": "8K",
          "quant": "Q4_K_M",
          "param": "8B",
          "size": 481406976,
          "summary": "General chat tuned for balance"
        }
      }
    ]
  }
  ```

- `GET /api/stats` – Returns an array of recent RAM usage percentages sampled for
  the [Chart.js](https://www.chartjs.org/) widget in the UI

  ```json
  [57.5,57.6,57.6]
  ```

- `POST /api/clear` – Empty request; clears model cache and restarts model information collection.
