Metadata-Version: 2.4
Name: table-categorization-agent
Version: 0.1.0
Summary: LLM-driven agent for categorizing data tables based on domain schemas
Author-email: StepFn AI <rajesh@stepfunction.ai>
License: MIT
Project-URL: Homepage, https://github.com/stepfnAI/table-categorization-agent
Project-URL: Repository, https://github.com/stepfnAI/table-categorization-agent
Project-URL: Issues, https://github.com/stepfnAI/table-categorization-agent/issues
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: sfn-blueprint>=0.1.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: openpyxl>=3.1.0
Requires-Dist: pyarrow>=12.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: flake8>=6.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"

# Table Categorization Agent

An LLM-driven agent that analyzes data tables and categorizes them based on domain schemas. This agent identifies entities, attributes, and relationships from data and maps them to domain concepts.

## Features

- **LLM-Driven Analysis**: Uses advanced language models to understand table descriptions and content.
- **Domain Schema Integration**: Maps tables to domain entities using JSON schema definitions.
- **Metadata-based Categorization**: Categorizes tables based on their descriptions and metadata.

## Installation
### Prerequisites
- Git access to required repositories
- [uv](https://docs.astral.sh/uv/getting-started/installation/) – package & environment manager  
  Please refer to the official installation guide for the most up-to-date instructions.  
  For quick setup on macOS/Linux, you can currently use:  
  ```bash
  curl -LsSf https://astral.sh/uv/install.sh | sh
  ```
- OpenAI API key


### Step-by-Step Installation

1. **Clone the main repository:**
   ```bash
   git clone https://github.com/stepfnAI/table_categorization_agent.git
   cd table_categorization_agent
   git switch dev
   uv sync
   source .venv/bin/activate
   cd ../
   ```

2. **Install blueprint:** 
   ```bash
   git clone https://github.com/stepfnAI/sfn_blueprint.git
   cd sfn_blueprint
   git switch dev
   uv pip install -e .
   ```

3. **Configure environment:**
   ```bash
   export OPENAI_API_KEY="your-api-key-here"
   ```

## Testing

The agent uses `pytest` for testing.

### Running All Tests

```bash
# Run all tests
pytest
```

### Running a Single Test File

To run a specific test file, provide the path to the file:

```bash
# Example: Run the main API test
pytest tests/test_agent_new_api.py
```

```bash
# Example: Run the new feature test
pytest tests/test_agent_new_feature.py
```

The tests are located in the `tests/` directory. The main test files for the agent's API are `tests/test_agent_new_api.py` and `tests/test_agent_new_feature.py`.




## Quick Start

To see a quick demonstration, run the provided example script from the root of the project directory. This will execute the agent with pre-defined metadata and print the result.

```bash
python example/basic_usage.py
```


Here's how to use the agent to categorize tables based on their descriptions:

```python
from table_categorization_agent import TableCategorizationAgent

# 1. Initialize the agent
agent = TableCategorizationAgent()

# 2. Define descriptions for your tables
table_descriptions = {
    "table1": "This table stores borrower profile information, including personal details, contact information, identification numbers, and credit-related attributes. Each row represents a unique borrower record used for managing borrower data.",
    "table2": "This table records loan modification details, including references to borrower and loan IDs, modification attributes, updated terms, and approval information. Each row represents a specific modification event for a loan.",
    "table3": "This table captures loan payment transactions, including breakdowns of principal, interest, insurance, and tax components. Each row represents a transaction with associated loan reference, payment details, and status."
}

# 3. Define your task
# You need a domain schema file 
task_data = {
    "domain_schema": "path/to/your/domain_schema.json",
    "tables_metadata": table_descriptions
}

# 4. Execute the task
# This will make a call to an LLM. Ensure you have the necessary API keys configured.
output = agent.execute_task(task_data)

# 5. Print the result
print(output)
```

## Domain Schema Format

The agent works with domain schemas in JSON format that define entities, attributes, and relationships:

```json
{
  "entities": [
    {
      "iri": ":Borrower_Profile",
      "label": "Borrower Profile",
      "attributes": [
        {
          "iri": ":BorrowerId",
          "label": "BorrowerId",
          "range": [":UUID"]
        }
      ]
    }
  ]
}
```


## Prompt Management

All prompts used by this agent are centralized in `src/table_categorization_agent/constants.py` for easy review and modification.

## Contributing

1.  Fork the repository
2.  Create a feature branch
3.  Make your changes
4.  Add tests
5.  Submit a pull request

## License

MIT License - see LICENSE file for details.

## Support

For support and questions:
- Email: rajesh@stepfunction.ai
- Issues: https://github.com/stepfnAI/table-categorization-agent/issues
