Metadata-Version: 2.4
Name: masai_framework
Version: 0.4.0
Summary: Multi-Agent System Framework for AI Agents with Vanilla SDK Wrappers and Custom LangGraph
Author-email: PILER <mrpolymathematica@gmail.com>
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: openai>=2.6.0
Requires-Dist: google-generativeai==0.8.5
Requires-Dist: pydantic>=2.12.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: requests>=2.32.0
Requires-Dist: tenacity>=9.1.0
Requires-Dist: tiktoken>=0.12.0
Requires-Dist: colorama>=0.4.6
Requires-Dist: json-repair>=0.39.1
Requires-Dist: aiofiles>=25.1.0
Requires-Dist: tqdm>=4.67.0
Requires-Dist: numpy==1.26.4
Requires-Dist: pandas>=2.3.3
Requires-Dist: redis>=6.4.0
Requires-Dist: networkx>=3.4.2
Provides-Extra: tools
Dynamic: license-file

# MAS-AI Framework: Complete Implementation Guide

+ PLEASE STAR THE PROJECT IF YO LIKE IT, BEFORE CLONING. A LOT OF PEOPLE CLONE WITHOUT STARRING

# MAS AI: Multi-Agent System Framework

|                                     |
|:-----------------------------------:|
|![MAS AI](/MAS/Logo/image.png) |

## MAS AI is a powerful framework for building scalable, intelligent multi-agent systems with advanced memory management and flexible collaboration patterns built using langgraph.

- Each agent in mas-ai has different components that work together to achieve the goal.

- Combine such agents in Multi-Agent Systems to achieve more complex goals.

- Combine such Multi-Agent Systems in an Orchestrated Multi-Agent Network (OMAN) to achieve even more complex goals.


## Featured

- ### [MAS AI was featred by sageflow community as a part of their newsletter. Check them out.](https://sageflow.ai/)

|                                     |
|:-----------------------------------:|
|![SageFlow Shoutout](/MAS/Logo/sageflow_masai.jpg)|

## Agent Architecture

MAS AI introduces two agent architectures optimized for different use cases:


## Table of Contents
1. [Introduction](#introduction)
2. [Temperature & Reasoning Models](#temperature--reasoning-models)
3. [Framework Architecture](#framework-architecture)
4. [Custom LangGraph Implementation](#custom-langgraph-implementation)
5. [Agent Components](#agent-components)
6. [Memory System](#memory-system)
7. [Multi-Agent Workflows](#multi-agent-workflows)
8. [Parameter Reference](#parameter-reference)
9. [Use Cases & Best Practices](#use-cases--best-practices)
10. [Advanced Features](#advanced-features)
11. [Installation & Setup](#installation--setup)

---

## Introduction

**MAS-AI** (Multi-Agent System AI) is a modular framework for building intelligent agent systems with a **custom LangGraph implementation**. It provides:

- **Custom LangGraph Engine**: Built-in workflow execution engine with zero external dependencies
- **Modular Agent Architecture**: Router, Evaluator, Reflector, Planner components
- **Hierarchical Memory System**: Short-term, component-shared, long-term, and vector store memory
- **Multiple Collaboration Patterns**: Sequential, Hierarchical, Decentralized workflows
- **LLM Flexibility**: Native support for OpenAI and Google Gemini via vanilla SDK wrappers
- **Intelligent Temperature Handling**: Automatic mapping to reasoning parameters (reasoning_effort, thinkingBudget, etc.)
- **Advanced Reasoning Models**: Native support for GPT-5, o-series, Gemini 2.5 with automatic parameter adaptation
- **Tool Integration**: LangChain-compatible tools with Redis caching support

### Why MAS-AI?

MAS-AI stands apart from conventional multi-agent frameworks by offering:

1. **Explicit Node Separation**: Unlike monolithic LLM systems, MAS-AI distributes responsibilities across specialized components (Router, Evaluator, Reflector, Planner)
2. **State-Machine Orchestration**: LangGraph-based state machine allows dynamic transitions between nodes based on satisfaction criteria
3. **Granular Memory Integration**: Multi-layered memory system (short-term, component-shared, long-term, vector store)
4. **Optimized for Complex Workflows**: Particularly well-suited for research-intensive tasks, multi-step decision processes, and tool-augmented execution

---

## Temperature & Reasoning Models

MAS-AI provides **intelligent temperature handling** that automatically adapts to reasoning models from different providers. The framework transparently maps standard temperature values to provider-specific reasoning parameters.

### 🎯 Automatic Parameter Mapping

When you set a temperature value, MAS-AI automatically detects the model type and uses the appropriate reasoning parameter:

| Temperature | OpenAI Reasoning | Gemini Thinking | Standard Models |
|-------------|------------------|-----------------|-----------------|
| **0.0 - 0.3** | `reasoning_effort: "low"` | `thinkingBudget: "low"` | `temperature: 0.0-0.3` |
| **0.4 - 0.7** | `reasoning_effort: "medium"` | `thinkingBudget: "medium"` | `temperature: 0.4-0.7` |
| **0.8 - 1.0** | `reasoning_effort: "high"` | `thinkingBudget: "high"` | `temperature: 0.8-1.0` |

### 🧠 Supported Reasoning Models

#### OpenAI Reasoning Models
- **GPT-5 Series**: `gpt-5`, `gpt-5-mini`, `gpt-5-nano`, `gpt-5-thinking*`
- **O-Series**: `o1`, `o1-mini`, `o1-preview`, `o3`, `o3-mini`, `o4-mini`
- **GPT-4.1 Series**: `gpt-4.1`, `gpt-4.1-nano`

**Note**: These models do NOT support the `temperature` parameter. MAS-AI automatically uses `reasoning_effort` instead.

#### Gemini Thinking Models
- **Gemini 2.5**: `gemini-2.5-flash`, `gemini-2.5-pro` (built-in thinking)
- **Gemini 2.0**: `gemini-2.0-flash-thinking-exp` (experimental)

**Note**: These models support both `temperature` AND optional `thinkingBudget` for enhanced reasoning control.

### 💡 Usage Example

```python
from masai import AgentManager

# Configure with temperature - MAS-AI handles the rest!
agent_manager = AgentManager(
    agent_name="research_agent",
    model_config={
        "router": {"model_name": "o4-mini", "category": "openai"},
        "evaluator": {"model_name": "gemini-2.5-flash", "category": "gemini"},
        "reflector": {"model_name": "gpt-5", "category": "openai"}
    },
    temperature=0.2  # Automatically maps to "low" reasoning effort
)

# For creative tasks, use higher temperature
creative_agent = AgentManager(
    agent_name="creative_agent",
    temperature=0.9  # Automatically maps to "high" reasoning effort
)
```

### 🎨 Temperature Guidelines

| Use Case | Recommended Temperature | Reasoning Level |
|----------|------------------------|-----------------|
| **Factual Queries** | 0.0 - 0.3 | Low/Focused - Deterministic, precise reasoning |
| **General Tasks** | 0.4 - 0.7 | Medium/Balanced - Balanced exploration and precision |
| **Creative Tasks** | 0.8 - 1.0 | High/Exploratory - Creative, diverse reasoning paths |

### 🔧 Model Configuration

Configure different models for different components in `model_config.json`:

#### Example 1: Production Configuration with Safety Settings (Recommended)

```json
{
    "all": {
        "router": {
            "model_name": "gemini-2.5-pro",
            "category": "gemini",

            "temperature": 0.2,
            "max_output_tokens": 2048,
            "top_p": 0.95,
            "top_k": 20,
            "thinking_budget": -1,

            "// CRITICAL: Safety settings prevent finish_reason=1 errors": "",
            "// Use BLOCK_NONE for maximum reliability in production": "",
            "safety_settings": [
                {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE"},
                {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE"},
                {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE"},
                {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE"}
            ]
        },
        "evaluator": {
            "model_name": "gemini-2.5-flash",
            "category": "gemini",

            "temperature": 0.3,
            "max_output_tokens": 1024,
            "thinking_budget": -1,

            "safety_settings": [
                {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_ONLY_HIGH"},
                {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_ONLY_HIGH"},
                {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_ONLY_HIGH"},
                {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_ONLY_HIGH"}
            ]
        },
        "reflector": {
            "model_name": "gpt-5",
            "category": "openai",

            "// Note: temperature ignored for reasoning models": "",
            "max_output_tokens": 2048,
            "reasoning_effort": "medium",

            "presence_penalty": 0.0,
            "frequency_penalty": 0.0,
            "seed": 42
        }
    }
}
```

#### Example 2: Mixed Provider Configuration

```json
{
    "all": {
        "router": {
            "model_name": "o1-mini",
            "category": "openai",
            "reasoning_effort": "low",
            "max_output_tokens": 2048
        },
        "evaluator": {
            "model_name": "gemini-2.5-flash",
            "category": "gemini",
            "temperature": 0.3,
            "safety_settings": [
                {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE"},
                {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE"},
                {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE"},
                {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE"}
            ]
        },
        "reflector": {
            "model_name": "gpt-5",
            "category": "openai",
            "reasoning_effort": "high",
            "max_output_tokens": 4096
        }
    }
}
```

**Key Benefits:**
- ✅ **Transparent**: Use standard temperature values everywhere
- ✅ **Automatic**: Framework handles provider-specific parameters
- ✅ **Flexible**: Mix different reasoning models in one agent
- ✅ **Optimal**: Each provider uses its best reasoning mechanism

---

### 🚨 Solving Gemini finish_reason=1 Errors

**Problem**: Gemini API returns `finish_reason=1` (SAFETY) when content is blocked by safety filters, causing workflow failures.

**Root Cause**: Default safety thresholds are too strict for production use, blocking legitimate content.

**Solution**: Set all safety categories to `BLOCK_NONE` threshold for maximum reliability.

#### Quick Fix

Add this to your `model_config.json` for all Gemini models:

```json
{
    "all": {
        "router": {
            "model_name": "gemini-2.5-pro",
            "category": "gemini",
            "safety_settings": [
                {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE"},
                {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE"},
                {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE"},
                {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE"}
            ]
        }
    }
}
```

#### Safety Threshold Options

| Threshold | Behavior | Use Case |
|-----------|----------|----------|
| `BLOCK_NONE` | No blocking (maximum reliability) | **Production systems** |
| `BLOCK_ONLY_HIGH` | Block only high-confidence harmful content | Balanced approach |
| `BLOCK_MEDIUM_AND_ABOVE` | Block medium+ confidence | Conservative |
| `BLOCK_LOW_AND_ABOVE` | Block low+ confidence (default) | Very conservative |

#### Runtime Override with config_dict

```python
manager.create_agent(
    agent_name="production_agent",
    tools=tools,
    agent_details=agent_details,
    config_dict={
        "router_safety_settings": [
            {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE"},
            {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE"},
            {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE"},
            {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE"}
        ],
        "evaluator_safety_settings": [
            {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_ONLY_HIGH"},
            {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_ONLY_HIGH"},
            {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_ONLY_HIGH"},
            {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_ONLY_HIGH"}
        ]
    }
)
```

**Recommendation**: Use `BLOCK_NONE` for production systems to prevent unexpected workflow failures. Implement content filtering at the application level if needed.

---

## Framework Architecture

### Agent Control Flow

The MASAI framework follows a sophisticated control flow pattern where agents process queries through multiple specialized nodes. The detailed workflow diagrams are shown in the architecture sections below.

**Key Control Flow Rules:**

1. **Entry Point**: Router (default) or Planner (if planning enabled)
2. **Conditional Edges**: ALL nodes use `checkroutingcondition()` to decide next step
3. **Routing Logic**:
   - `"continue"` → Execute tool (router/planner) or Evaluate (execute_tool) or Execute again (evaluator/reflector)
   - `"reflection"` → Reflect and improve
   - `"end"` → Terminate workflow
4. **Return Direct**: Tools with `return_direct=True` cause `execute_tool` to go directly to `END`
5. **Loop Control**: Continues until `satisfied=True` and `current_tool=""` or max iterations reached

### Core Components

```
┌─────────────────────────────────────────────────────────────┐
│                    MAS-AI FRAMEWORK                         │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌──────────────────────────────────────────────────────┐  │
│  │           AGENT MANAGER                              │  │
│  │  - Creates and manages agents                        │  │
│  │  - Configures tools and memory                       │  │
│  │  - Handles model configuration                       │  │
│  └──────────────────────────────────────────────────────┘  │
│                          │                                  │
│                          ▼                                  │
│  ┌──────────────────────────────────────────────────────┐  │
│  │           SINGULAR AGENT                             │  │
│  │  ┌────────────┐  ┌────────────┐  ┌────────────┐     │  │
│  │  │   Router   │→ │ Evaluator  │→ │ Reflector  │     │  │
│  │  └────────────┘  └────────────┘  └────────────┘     │  │
│  │         OR                                           │  │
│  │  ┌────────────┐  ┌────────────┐  ┌────────────┐     │  │
│  │  │  Planner   │→ │  Executor  │→ │ Reflector  │     │  │
│  │  └────────────┘  └────────────┘  └────────────┘     │  │
│  └──────────────────────────────────────────────────────┘  │
│                          │                                  │
│                          ▼                                  │
│  ┌──────────────────────────────────────────────────────┐  │
│  │      MULTI-AGENT SYSTEM (MAS)                        │  │
│  │  - Sequential: Fixed pipeline                        │  │
│  │  - Hierarchical: Supervisor-based                    │  │
│  │  - Decentralized: Peer-to-peer                       │  │
│  └──────────────────────────────────────────────────────┘  │
│                          │                                  │
│                          ▼                                  │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  ORCHESTRATED MULTI-AGENT NETWORK (OMAN)             │  │
│  │  - Coordinates multiple MAS instances                │  │
│  │  - Network-level memory and routing                  │  │
│  └──────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘
```

### Component Reference Table

| Component | Purpose | Import Path |
|-----------|---------|-------------|
| **AgentManager** | Central registry for creating and managing agents | `from masai.AgentManager.AgentManager import AgentManager, AgentDetails` |
| **Agent** | Singular agent with Router-Evaluator-Reflector architecture | `from masai.Agents.singular_agent import Agent` |
| **BaseAgent** | Base class with common agent functionality | `from masai.Agents.base_agent import BaseAgent` |
| **MASGenerativeModel** | LLM wrapper with memory and context management | `from masai.GenerativeModel.generativeModels import MASGenerativeModel` |
| **BaseGenerativeModel** | Simple LLM wrapper without agent architecture | `from masai.GenerativeModel.baseGenerativeModel.basegenerativeModel import BaseGenerativeModel` |
| **MultiAgentSystem** | Coordinates multiple agents in workflows | `from masai.MultiAgents.MultiAgent import MultiAgentSystem, SupervisorConfig` |
| **TaskManager** | Manages concurrent tasks in hierarchical MAS | `from masai.MultiAgents.TaskManager import TaskManager` |
| **OMAN** | Orchestrates multiple MAS instances | `from masai.OMAN.oman import OrchestratedMultiAgentNetwork` |
| **InMemoryDocStore** | Vector store for semantic search | `from masai.Memory.InMemoryStore import InMemoryDocStore` |
| **ToolCache** | Redis-based caching for tools | `from masai.Tools.utilities.cache import ToolCache` |
| **Config** | Global configuration parameters | `from masai.Config import config` |

### Framework Levels Explained

#### Level 1: Singular Agent
- **Single agent** with internal components (Router, Evaluator, Reflector, optional Planner)
- Handles queries independently using tools
- Can delegate to other agents in decentralized mode
- **Use when**: Single domain, tool-heavy tasks, simple queries

#### Level 2: Multi-Agent System (MAS)
- **Multiple agents** working together in coordinated workflows
- Three workflow types: Sequential, Hierarchical, Decentralized
- Shared memory and context across agents
- **Use when**: Multi-domain tasks, complex workflows, quality control needed

#### Level 3: Orchestrated Multi-Agent Network (OMAN)
- **Multiple MAS instances** coordinated by OMAN supervisor
- Each MAS specializes in different domains
- Network-level routing and memory
- **Use when**: Enterprise-scale, multiple specialized systems, cross-domain coordination

---

## Custom LangGraph Implementation

**MAS-AI** now includes a **custom LangGraph implementation** that provides all the functionality of the original LangGraph library while being fully integrated into the MASAI framework. This eliminates external dependencies and gives you complete control over the workflow execution engine.

### Key Features

- **🔧 Zero External Dependencies**: No need to install the external `langgraph` package
- **⚡ Optimized Performance**: Custom implementation tailored for MASAI's specific needs
- **🎯 Full Compatibility**: Drop-in replacement for existing LangGraph functionality
- **🔍 Enhanced Debugging**: Better error messages and debugging capabilities
- **🚀 Async-First Design**: Built from the ground up for async/await patterns

### Core Components

#### StateGraph
The main graph building class that allows you to create complex workflows:

```python
from masai.langgraph.graph import StateGraph, END, START

# Create a new state graph
graph = StateGraph(YourStateType)

# Add nodes (functions that process state)
graph.add_node("router", router_function)
graph.add_node("evaluator", evaluator_function)

# Add edges between nodes
graph.add_edge("router", "evaluator")

# Add conditional edges based on state
graph.add_conditional_edges("evaluator", condition_function, {
    "continue": "router",
    "end": END
})

# Set entry point and compile
graph.set_entry_point("router")
compiled_graph = graph.compile()
```

#### CompiledStateGraph
The executable graph that runs your workflow:

```python
# Execute the entire workflow
final_state = await compiled_graph.ainvoke(initial_state)

# Stream state updates in real-time
async for update in compiled_graph.astream(initial_state):
    print(f"Node update: {update}")

# Visualize the graph structure
graph_viz = compiled_graph.get_graph()
mermaid_diagram = graph_viz.get_mermaid_text()
print(mermaid_diagram)

# Generate PNG diagram (for display method compatibility)
png_data = graph_viz.draw_mermaid_png()
```

### Migration from External LangGraph

If you're migrating from the external LangGraph library, simply update your imports:

```python
# Old imports
from langgraph.graph import StateGraph, END, START
from langgraph.graph.state import CompiledStateGraph

# New imports (MASAI custom implementation)
from masai.langgraph.graph import StateGraph, END, START
from masai.langgraph.graph.state import CompiledStateGraph
```

**No other code changes required!** The API is 100% compatible.

### Graph Visualization

The custom LangGraph implementation includes powerful visualization capabilities:

```python
# Create and compile your graph
compiled_graph = graph.compile()

# Get the visualization object
graph_viz = compiled_graph.get_graph()

# Generate Mermaid diagram text
mermaid_text = graph_viz.get_mermaid_text()
print(mermaid_text)

# Generate PNG data (for compatibility with MASAI display method)
png_data = graph_viz.draw_mermaid_png()

# Save Mermaid diagram to file
graph_viz.save_mermaid_text("my_workflow.mmd")
```

**MASAI Agent Display Method:**
```python
# Works exactly as before with custom implementation
agent = Agent(...)
agent.display()  # Saves PNG diagram to MAS/Database/mermaid/
```

### Advanced Features

- **Graph Visualization**: Built-in Mermaid diagram generation with `get_graph()` and `draw_mermaid_png()`
- **Async Condition Functions**: Support for both sync and async condition functions
- **Flexible State Management**: Works with any TypedDict state structure
- **Error Handling**: Comprehensive error messages for debugging
- **Execution Statistics**: Built-in metrics for performance monitoring
- **Recursion Protection**: Configurable limits to prevent infinite loops
- **Display Method Compatibility**: Full compatibility with MASAI's `display()` method for agent visualization

---

## Agent Components

MAS AI introduces two agent architectures optimized for different use cases:

### 1. Router, Reflector, Evaluator
A reactive architecture for dynamic task routing and output validation:

```mermaid
graph TD
    START --> router
    router -->|continue| execute_tool
    router -->|reflection| reflection
    router -->|end| END
    execute_tool -->|continue| evaluator
    execute_tool -->|reflection| reflection
    execute_tool -->|end| END
    evaluator -->|continue| execute_tool
    evaluator -->|reflection| reflection
    evaluator -->|end| END
    reflection -->|continue| execute_tool
    reflection -->|reflection| reflection
    reflection -->|end| END
    evaluator[evaluator]
    reflection[reflection]
    execute_tool[execute_tool]
    router[router]
    START([START])
    END([END])
```

- **Router:** Analyzes queries and directs them to appropriate processing components
- **Evaluator:** Reviews outputs to ensure quality and relevance
- **Reflector:** Updates memory and improves routing strategies based on outcomes

### 2. Planner, Executor, Reflector

```mermaid
graph TD
    START --> planner
    planner -->|continue| execute_tool
    planner -->|reflection| reflection
    planner -->|end| END
    execute_tool -->|continue| evaluator
    execute_tool -->|reflection| reflection
    execute_tool -->|end| END
    evaluator -->|continue| execute_tool
    evaluator -->|reflection| reflection
    evaluator -->|end| END
    reflection -->|continue| execute_tool
    reflection -->|reflection| reflection
    reflection -->|end| END
    evaluator[evaluator]
    reflection[reflection]
    execute_tool[execute_tool]
    planner[planner]
    START([START])
    END([END])
```

A proactive architecture for task planning and dependency management:

- **Planner:** Breaks queries into structured task plans
- **Executor:** Assigns tasks to appropriate components or agents
- **Reflector:** Assesses results and adjusts plans as needed


### 1. Router-Evaluator-Reflector Architecture

**Purpose**: Reactive architecture for dynamic task routing and validation

#### Router
- **Function**: Analyzes queries and routes to appropriate tools/agents
- **Input**: User query + chat history + context
- **Output**: Tool selection OR agent delegation OR direct answer
- **Model Config**: `model_config.json → router`

#### Evaluator
- **Function**: Validates tool outputs and agent responses
- **Input**: Tool output + original query + context
- **Output**: Satisfaction status (satisfied/not_satisfied) + reasoning
- **Model Config**: `model_config.json → evaluator`

#### Reflector
- **Function**: Updates memory and refines strategies
- **Input**: Conversation history + outcomes
- **Output**: Updated memory + insights
- **Model Config**: `model_config.json → reflector`

**Workflow**:
```
Query → Router → Tool/Agent → Evaluator → [Satisfied?]
                                              ↓ No
                                          Reflector → Router (retry)
                                              ↓ Yes
                                          Final Answer
```

### 2. Planner-Executor-Reflector Architecture

**Purpose**: Proactive architecture for complex task decomposition

#### Planner
- **Function**: Decomposes queries into structured task plans
- **Input**: User query + context
- **Output**: Task list with dependencies
- **Model Config**: `model_config.json → planner`

#### Executor
- **Function**: Executes tasks using tools/agents
- **Input**: Task plan + tools
- **Output**: Task results
- **Model Config**: Uses router model

#### Reflector
- **Function**: Evaluates results and adjusts plans
- **Input**: Task results + original plan
- **Output**: Re-planning decisions
- **Model Config**: `model_config.json → reflector`

**Workflow**:
```
Query → Planner → Task List → Executor → Results → Reflector
                                                      ↓
                                              [Complete?]
                                                ↓ No
                                            Planner (re-plan)
                                                ↓ Yes
                                            Final Answer
```

### 3. Agent State Machine

MAS-AI agents use a **LangGraph state machine** to manage workflow execution. Understanding the state is crucial for debugging and optimization.

#### State Structure

```python
class State(TypedDict):
    messages: List[Dict[str, str]]           # Chat history
    current_tool: str                        # Currently selected tool
    tool_input: Any                          # Input for the tool
    tool_output: Any                         # Output from the tool
    answer: str                              # Current answer
    satisfied: bool                          # Satisfaction flag
    reasoning: str                           # LLM reasoning
    delegate_to_agent: Optional[str]         # Agent to delegate to
    current_node: str                        # Current node in workflow
    previous_node: Optional[str]             # Previous node
    plan: Optional[dict]                     # Task plan (if using planner)
    passed_from: Optional[str]               # Source of delegation
    reflection_counter: int                  # Number of reflections
    tool_loop_counter: int                   # Number of tool loops
```

#### State Transitions

```
┌─────────────────────────────────────────────────────────────┐
│                  AGENT STATE MACHINE                        │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  START                                                      │
│    ↓                                                        │
│  [plan=True?] ──Yes──→ PLANNER ──→ EXECUTOR                │
│    ↓ No                              ↓                      │
│  ROUTER ─────────────────────────────┘                      │
│    ↓                                                        │
│  [tool selected?]                                           │
│    ↓ Yes                                                    │
│  EXECUTE_TOOL                                               │
│    ↓                                                        │
│  EVALUATOR                                                  │
│    ↓                                                        │
│  [satisfied=True?]                                          │
│    ↓ No                                                     │
│  [tool_loop_counter > MAX?] ──Yes──→ REFLECTOR             │
│    ↓ No                                ↓                    │
│  ROUTER (retry)                        ↓                    │
│    ↓ Yes                               ↓                    │
│  [delegate_to_agent?] ──Yes──→ DELEGATE_TO_AGENT           │
│    ↓ No                                ↓                    │
│  END (Final Answer)                    END                  │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

#### Loop Prevention Mechanisms

**1. Tool Loop Counter**
- Tracks consecutive tool uses
- Max limit: `config.max_tool_loops` (default: 3)
- When exceeded: Triggers warning prompt and forces reflection

**2. Reflection Counter**
- Tracks number of reflection cycles
- Max limit: `config.MAX_REFLECTION_COUNT` (default: 3)
- When exceeded: Forces final answer or delegation

**3. Recursion Limit**
- Overall workflow recursion limit
- Max limit: `config.MAX_RECURSION_LIMIT` (default: 100)
- Prevents infinite loops in complex workflows

#### Configuration Parameters

```python
from masai.Config import config

# Modify global config
config.max_tool_loops = 5              # Default: 3
config.MAX_REFLECTION_COUNT = 5        # Default: 3
config.MAX_RECURSION_LIMIT = 150       # Default: 100
config.stream_mode = "updates"         # LangGraph stream mode
config.truncated_response_length = 500 # Logging truncation
```

---

## Memory System

### Memory Hierarchy

```
┌─────────────────────────────────────────────────────────────┐
│                    MEMORY HIERARCHY                         │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Level 1: AGENT SHORT-TERM MEMORY                          │
│  ├─ Chat history (last N messages)                         │
│  ├─ Current context                                        │
│  └─ Parameter: memory_order (default: 5)                   │
│                                                             │
│  Level 2: COMPONENT MEMORY                                 │
│  ├─ Component short-term (per Router/Evaluator/etc.)       │
│  ├─ Component shared (between components)                  │
│  └─ Component long-term (summarized history)               │
│      └─ Parameter: long_context_order (default: 10)        │
│                                                             │
│  Level 3: MULTI-AGENT SYSTEM MEMORY                        │
│  ├─ Shared across all agents in MAS                        │
│  └─ Parameter: shared_memory_order (default: 3)            │
│                                                             │
│  Level 4: NETWORK MEMORY                                   │
│  ├─ Spans all MAS instances in OMAN                        │
│  └─ Parameter: network_memory_order                        │
│                                                             │
│  Level 5: EXTENDED MEMORY STORE                            │
│  ├─ Vector store for semantic search                       │
│  ├─ InMemoryDocStore (sentence-transformers)               │
│  └─ Parameter: in_memory_store, top_k                      │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

### Memory Parameters

| Parameter | Scope | Default | Purpose |
|-----------|-------|---------|---------|
| `memory` | Agent | `True` | Enable/disable memory |
| `memory_order` | Agent | `5` | Number of recent messages to keep |
| `long_context` | Agent | `False` | Enable long-term memory summarization |
| `long_context_order` | Agent | `10` | Number of old messages to summarize |
| `shared_memory_order` | MAS | `3` | Shared memory size across agents |
| `network_memory_order` | OMAN | N/A | Network-level memory size |
| `in_memory_store` | Agent | `None` | Vector store instance |
| `top_k` | Agent | `3` | Number of vector search results |
| `chat_log` | Agent | `None` | File path to save chat history |

### Memory Flow Detailed

#### 1. Chat History Management

```
┌─────────────────────────────────────────────────────────────┐
│              CHAT HISTORY LIFECYCLE                         │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  New Message                                                │
│    ↓                                                        │
│  Append to chat_history[]                                   │
│    ↓                                                        │
│  [len(chat_history) > memory_order?]                        │
│    ↓ Yes                                                    │
│  [long_context=True?]                                       │
│    ↓ Yes                          ↓ No                      │
│  Summarize old messages      Truncate to memory_order/2     │
│    ↓                              ↓                         │
│  Add to context_summaries[]  [chat_log set?]                │
│    ↓                              ↓ Yes                     │
│  [len(summaries) > long_context_order?]  Save to file      │
│    ↓ Yes                          ↓                         │
│  [LTIMStore set?]            Keep recent messages           │
│    ↓ Yes                                                    │
│  Move old summaries to vector store                         │
│    ↓                                                        │
│  Keep recent summaries                                      │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

**Key Points**:
- `chat_history` stores raw messages as `{'role': 'user/assistant', 'content': '...'}`
- When `len(chat_history) > memory_order`, old messages are processed
- If `long_context=True`, old messages are summarized using a separate LLM
- Summaries are stored in `context_summaries[]` as LangChain `Document` objects
- When summaries exceed `long_context_order`, oldest are moved to `LTIMStore` (if configured)
- If `chat_log` is set, truncated messages are saved to file before removal

#### 2. Context Summaries

**Purpose**: Compress old conversations while retaining key information

**Process**:
1. When `chat_history` exceeds `memory_order`, oldest messages are selected
2. A separate LLM (GenerativeModel with `temperature=0.5`) summarizes them
3. Summary prompt focuses on: main topics, key information, specific keywords, conclusions
4. Summary is stored as a `Document` with `page_content` field
5. Summaries are included in prompts under `<EXTENDED CONTEXT>` section

**Example Summary**:
```
"The user asked about implementing a caching system for API calls.
The assistant recommended Redis with a TTL of 30 minutes.
Key points: Use pickle for serialization, handle connection errors,
implement cache invalidation strategy. User confirmed implementation."
```

#### 3. InMemoryDocStore (LTIMStore)

**Purpose**: Semantic search over very old conversation history

**How It Works**:
```
┌─────────────────────────────────────────────────────────────┐
│           LTIMSTORE WORKFLOW                                │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Old Summaries (beyond long_context_order)                  │
│    ↓                                                        │
│  Convert to Document objects                                │
│    ↓                                                        │
│  Embed using SentenceTransformer                            │
│    ↓                                                        │
│  Store in InMemoryDocStore                                  │
│    ↓                                                        │
│  On New Query:                                              │
│    ↓                                                        │
│  Embed query                                                │
│    ↓                                                        │
│  Cosine similarity search                                   │
│    ↓                                                        │
│  Return top_k most relevant summaries                       │
│    ↓                                                        │
│  Include in prompt under <EXTENDED CONTEXT>                 │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

**Setup**:
```python
from masai.Memory.InMemoryStore import InMemoryDocStore

# Create vector store
memory_store = InMemoryDocStore(
    embedding_model="all-MiniLM-L6-v2"  # SentenceTransformer model
)

# Or use custom embedding function
def custom_embedder(texts: List[str]) -> np.ndarray:
    # Your embedding logic
    return embeddings

memory_store = InMemoryDocStore(embedding_model=custom_embedder)

# Or use LangChain embeddings
from langchain.embeddings import OpenAIEmbeddings
memory_store = InMemoryDocStore(embedding_model=OpenAIEmbeddings())

# Pass to agent
manager.create_agent(
    agent_name="research_agent",
    tools=tools,
    agent_details=details,
    long_context=True,
    long_context_order=20,
    in_memory_store=memory_store,
    top_k=3  # Return top 3 relevant memories
)
```

#### 4. Component Context

**Purpose**: Share information between agent components (Router → Evaluator → Reflector)

**Mechanism**:
- Each component can add messages to `component_context[]`
- These messages are passed to the next component
- Controlled by `shared_memory_order` parameter
- Allows components to communicate reasoning and intermediate results

**Example**:
```python
# Router adds context
component_context = [
    {'role': 'router', 'content': 'Selected database_tool because query mentions "users"'}
]

# Evaluator receives this context and adds its own
component_context.append(
    {'role': 'evaluator', 'content': 'Tool returned 150 users, satisfies query'}
)

# Reflector receives both contexts
```

#### 5. Chat Log Persistence

**Purpose**: Save conversation history to file for later analysis or resumption

**Setup**:
```python
manager = AgentManager(
    context={},
    logging=True,
    model_config_path="model_config.json",
    chat_log="./logs/agent_chat.json"  # File path
)
```

**Behavior**:
- When `chat_history` is truncated, removed messages are saved to file
- File format: JSON array of message objects
- Useful for debugging, analysis, or resuming conversations
- Automatically creates directory if it doesn't exist

---

## Multi-Agent Workflows

### 1. Sequential Workflow

**Use Case**: Fixed pipeline processing (ETL, document processing)

**Architecture**:
```
Agent 1 → Agent 2 → Agent 3 → Final Output
```

**Implementation**:
```python
from masai.MultiAgents.MultiAgent import MultiAgentSystem

mas = MultiAgentSystem(agentManager=manager)

result = mas.initiate_sequential_mas(
    query="Process this document",
    agent_sequence=["research_agent", "analysis_agent", "summary_agent"],
    memory_order=3  # Shared memory across agents
)
```

**Parameters**:
- `query` (str): Input query
- `agent_sequence` (List[str]): Ordered list of agent names
- `memory_order` (int): Shared memory size

**Data Flow**:
```
Query → Agent 1 (output_1) → Agent 2 (output_1 + output_2) → Agent 3 (final)
         ↓                      ↓                              ↓
      Memory[0]              Memory[1]                     Memory[2]
```

### 2. Hierarchical Workflow

**Use Case**: Complex tasks requiring supervision and quality control

**Architecture**:
```
                    Supervisor LLM
                         │
        ┌────────────────┼────────────────┐
        ▼                ▼                ▼
    Agent 1          Agent 2          Agent 3
        │                │                │
        └────────────────┴────────────────┘
                         │
                    Task Manager
                         │
                  Result Callback
```

**Implementation**:
```python
from masai.MultiAgents.MultiAgent import MultiAgentSystem, SupervisorConfig

def handle_task_result(task):
    print(f"Task completed: {task['answer']}")

supervisor_config = SupervisorConfig(
    model_name="gpt-4",
    temperature=0.2,
    model_category="openai",
    memory_order=20,
    memory=True,
    extra_context={"organization": "Benosphere"}
)

mas = MultiAgentSystem(
    agentManager=manager,
    supervisor_config=supervisor_config,
    heirarchical_mas_result_callback=handle_task_result,
    agent_return_direct=True
)

result = await mas.initiate_hierarchical_mas("Complex research task")
```

**Parameters**:
- `supervisor_config` (SupervisorConfig): Supervisor LLM configuration
- `heirarchical_mas_result_callback` (Callable): Callback for task completion
- `agent_return_direct` (bool): Return agent output directly without supervisor review

**Data Flow**:
```
Query → Supervisor → Task Queue → Agent → Result → Supervisor Review
                                                         ↓
                                                  [Satisfied?]
                                                    ↓ No
                                            Revision Request → Agent
                                                    ↓ Yes
                                                Callback → Final Output
```

### 3. Decentralized Workflow

**Use Case**: Peer-to-peer collaboration, adaptive workflows

**Architecture**:
```
Entry Agent ⇄ Agent 2 ⇄ Agent 3
     ↕           ↕         ↕
  Agent 4 ⇄  Agent 5 ⇄ Agent 6
```

**Implementation**:
```python
mas = MultiAgentSystem(agentManager=manager)

result = await mas.initiate_decentralized_mas(
    query="Research AI trends and schedule meeting",
    set_entry_agent=manager.get_agent("personal_assistant")
)
```

**Parameters**:
- `query` (str): Input query
- `set_entry_agent` (Agent): Initial agent to handle query

**Data Flow**:
```
Query → Entry Agent → [Delegate?] → Agent 2 → [Delegate?] → Agent 3
                         ↓ No                    ↓ No
                      Answer                  Answer
```

---

## Parameter Reference

### 1. AgentManager.__init__()

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `logging` | `bool` | No | `True` | Enable/disable logging of agent activities |
| `context` | `dict` | No | `None` | Static context shared with all agents (e.g., `{"org": "MyCompany"}`) |
| `model_config_path` | `str` | **Yes** | N/A | Path to model configuration JSON file |
| `chat_log` | `str` | No | `None` | File path to save chat history when truncated |
| `streaming` | `bool` | No | `False` | Enable streaming responses from LLMs |
| `streaming_callback` | `Callable` | No | `None` | Async callback function for streaming chunks (required if `streaming=True`) |

**Example**:
```python
from masai.AgentManager.AgentManager import AgentManager

manager = AgentManager(
    context={"organization": "MyCompany", "environment": "production"},
    logging=True,
    model_config_path="./config/model_config.json",
    chat_log="./logs/chat_history.json",
    streaming=True,
    streaming_callback=async_streaming_handler
)
```

### 2. AgentManager.create_agent()

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `agent_name` | `str` | **Yes** | N/A | Unique identifier for the agent (converted to lowercase) |
| `tools` | `List[Tool]` | **Yes** | N/A | List of LangChain tools the agent can use |
| `agent_details` | `AgentDetails` | **Yes** | N/A | Agent configuration (capabilities, description, style) |
| `memory_order` | `int` | No | `10` | Number of recent messages to keep in short-term memory |
| `long_context` | `bool` | No | `True` | Enable long-term memory with summarization |
| `long_context_order` | `int` | No | `20` | Number of old message summaries to keep |
| `shared_memory_order` | `int` | No | `10` | Shared memory size between components |
| `plan` | `bool` | No | `False` | Use Planner-Executor-Reflector architecture (vs Router-Evaluator-Reflector) |
| `temperature` | `float` | No | `0.2` | Default temperature for all LLMs (can be overridden per component) |
| `context_callable` | `Callable` | No | `None` | Function to fetch dynamic context on each query |
| `in_memory_store` | `InMemoryDocStore` | No | `None` | Vector store for semantic search over old conversations |
| `top_k` | `int` | No | `3` | Number of results to retrieve from vector store |
| `config_dict` | `dict` | No | `None` | Per-component configuration overrides (see below) |

**config_dict Structure**:

The `config_dict` parameter allows runtime override of component-specific settings. Use the format: `{component}_{parameter_name}`.

```python
config_dict = {
    # ═══════════════════════════════════════════════════════════
    # MEMORY CONFIGURATION
    # ═══════════════════════════════════════════════════════════
    "router_memory_order": 10,
    "router_long_context_order": 15,
    "evaluator_memory_order": 5,
    "evaluator_long_context_order": 10,
    "reflector_memory_order": 15,
    "reflector_long_context_order": 20,
    "planner_memory_order": 10,  # Only if plan=True
    "planner_long_context_order": 15,

    # ═══════════════════════════════════════════════════════════
    # CORE MODEL PARAMETERS
    # ═══════════════════════════════════════════════════════════
    "router_temperature": 0.3,
    "evaluator_temperature": 0.1,
    "reflector_temperature": 0.7,
    "planner_temperature": 0.2,

    # ═══════════════════════════════════════════════════════════
    # GEMINI-SPECIFIC PARAMETERS
    # ═══════════════════════════════════════════════════════════
    # Router (Gemini model)
    "router_max_output_tokens": 2048,
    "router_top_k": 20,
    "router_top_p": 0.95,
    "router_thinking_budget": -1,  # -1=dynamic, 0=off, 1-10000=fixed
    "router_safety_settings": [
        {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE"},
        {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE"},
        {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE"},
        {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE"}
    ],

    # Evaluator (Gemini model)
    "evaluator_max_output_tokens": 1024,
    "evaluator_safety_settings": [
        {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_ONLY_HIGH"},
        {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_ONLY_HIGH"},
        {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_ONLY_HIGH"},
        {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_ONLY_HIGH"}
    ],

    # ═══════════════════════════════════════════════════════════
    # OPENAI-SPECIFIC PARAMETERS
    # ═══════════════════════════════════════════════════════════
    # Reflector (OpenAI reasoning model)
    "reflector_reasoning_effort": "medium",  # "low", "medium", "high"
    "reflector_max_output_tokens": 2048,
    "reflector_presence_penalty": 0.0,
    "reflector_frequency_penalty": 0.0,
    "reflector_seed": 42,

    # Planner (OpenAI standard model)
    "planner_max_output_tokens": 1024,
    "planner_stop_sequences": ["END", "STOP"],
    "planner_enable_logprobs": False,
    "planner_num_logprobs": 0,

    # ═══════════════════════════════════════════════════════════
    # STREAMING CONFIGURATION
    # ═══════════════════════════════════════════════════════════
    "router_streaming": True,
    "evaluator_streaming": False,
    "reflector_streaming": False,
    "planner_streaming": True,
    "router_streaming_callback": custom_router_callback,  # Optional
    "evaluator_streaming_callback": custom_evaluator_callback,  # Optional
}
```

**Available Parameters by Provider**:

| Parameter Category | Gemini Parameters | OpenAI Parameters |
|-------------------|-------------------|-------------------|
| **Core** | `temperature`, `max_output_tokens`, `top_p` | `temperature`, `max_output_tokens`, `top_p` |
| **Generation** | `top_k`, `stop_sequences`, `candidate_count` | `stop_sequences` |
| **Safety** | `safety_settings` | N/A |
| **Thinking/Reasoning** | `thinking_budget` | `reasoning_effort` |
| **Logprobs** | `response_logprobs`, `logprobs` | `enable_logprobs`, `num_logprobs` |
| **Penalties** | `presence_penalty`, `frequency_penalty`, `seed` | `presence_penalty`, `frequency_penalty`, `seed` |

**Parameter Format**: `{component}_{parameter_name}`

**Examples**:
- `router_temperature` → Sets temperature for router component
- `router_safety_settings` → Sets safety settings for router (Gemini only)
- `reflector_reasoning_effort` → Sets reasoning effort for reflector (OpenAI reasoning models only)
- `evaluator_max_output_tokens` → Sets max tokens for evaluator

**Note**: Unknown parameters are automatically filtered out to prevent API errors.
```

**Example 1: Basic Configuration**:
```python
from masai.AgentManager.AgentManager import AgentDetails

agent_details = AgentDetails(
    capabilities=["data analysis", "report generation"],
    description="Analyzes data and generates insights",
    style="concise and data-focused"
)

manager.create_agent(
    agent_name="data_analyst",
    tools=[database_tool, chart_tool],
    agent_details=agent_details,
    config_dict={
        "router_temperature": 0.2,
        "evaluator_temperature": 0.1,
        "reflector_temperature": 0.5
    }
)
```

**Example 2: Gemini with Safety Settings (Prevents finish_reason=1)**:
```python
manager.create_agent(
    agent_name="research_agent",
    tools=[search_tool, summarize_tool],
    agent_details=agent_details,
    config_dict={
        # Router (Gemini 2.5 Pro)
        "router_temperature": 0.2,
        "router_max_output_tokens": 2048,
        "router_top_k": 20,
        "router_thinking_budget": -1,
        "router_safety_settings": [
            {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE"},
            {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE"},
            {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE"},
            {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE"}
        ],

        # Evaluator (Gemini 2.5 Flash)
        "evaluator_temperature": 0.3,
        "evaluator_max_output_tokens": 1024,
        "evaluator_safety_settings": [
            {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_ONLY_HIGH"},
            {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_ONLY_HIGH"},
            {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_ONLY_HIGH"},
            {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_ONLY_HIGH"}
        ]
    }
)
```

**Example 3: OpenAI Reasoning Models**:
```python
manager.create_agent(
    agent_name="reasoning_agent",
    tools=[analysis_tool, planning_tool],
    agent_details=agent_details,
    config_dict={
        # Router (o1-mini)
        "router_reasoning_effort": "low",
        "router_max_output_tokens": 2048,

        # Reflector (GPT-5)
        "reflector_reasoning_effort": "high",
        "reflector_max_output_tokens": 4096,
        "reflector_presence_penalty": 0.0,
        "reflector_frequency_penalty": 0.0,
        "reflector_seed": 42
    }
)
```

**Example 4: Mixed Providers with Streaming**:
```python
async def router_callback(chunk):
    print(f"Router: {chunk}", end="", flush=True)

manager.create_agent(
    agent_name="hybrid_agent",
    tools=[search_tool, database_tool],
    agent_details=agent_details,
    config_dict={
        # Router (Gemini 2.5 Pro) - with streaming
        "router_temperature": 0.2,
        "router_max_output_tokens": 2048,
        "router_safety_settings": [
            {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE"},
            {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE"},
            {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE"},
            {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE"}
        ],
        "router_streaming": True,
        "router_streaming_callback": router_callback,

        # Reflector (GPT-5) - no streaming
        "reflector_reasoning_effort": "medium",
        "reflector_max_output_tokens": 2048,
        "reflector_streaming": False
    }
)
```

### 3. AgentDetails

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `capabilities` | `List[str]` | **Yes** | N/A | List of agent capabilities (e.g., `["reasoning", "coding", "science"]`) |
| `description` | `str` | No | `""` | Detailed description of agent's purpose and behavior |
| `style` | `str` | No | `"gives very elaborate answers"` | Communication style for responses |

**Example**:
```python
from masai.AgentManager.AgentManager import AgentDetails

agent_details = AgentDetails(
    capabilities=["database queries", "data analysis", "visualization"],
    description="Specializes in analyzing database data and creating visual reports",
    style="concise and data-focused, uses bullet points"
)
```

### 4. MASGenerativeModel.__init__()

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `model_name` | `str` | **Yes** | N/A | LLM model name (e.g., `"gpt-4"`, `"gemini-2.0-flash"`) |
| `temperature` | `float` | **Yes** | N/A | Temperature for randomness (0.0-1.0) |
| `category` | `str` | **Yes** | N/A | Model category: `"openai"` or `"gemini"` |
| `prompt_template` | `ChatPromptTemplate` | No | `None` | LangChain prompt template |
| `memory_order` | `int` | No | `5` | Number of recent messages to keep |
| `extra_context` | `dict` | No | `None` | Static context (e.g., `{"user_id": "123"}`) |
| `long_context` | `bool` | No | `False` | Enable long-term memory summarization |
| `long_context_order` | `int` | No | `10` | Number of summaries to keep |
| `chat_log` | `str` | No | `None` | File path to save chat history |
| `streaming` | `bool` | No | `False` | Enable streaming responses |
| `streaming_callback` | `Callable` | No | `None` | Async callback for streaming chunks |
| `context_callable` | `Callable` | No | `None` | Function to fetch dynamic context per query |
| `memory_store` | `InMemoryDocStore` | No | `None` | Vector store for semantic search (kwarg) |
| `k` | `int` | No | `3` | Number of vector search results (kwarg) |

**Example**:
```python
from masai.GenerativeModel.generativeModels import MASGenerativeModel

llm = MASGenerativeModel(
    model_name="gemini-2.0-flash",
    temperature=0.3,
    category="gemini",
    memory_order=10,
    extra_context={"user_role": "admin"},
    long_context=True,
    long_context_order=20,
    context_callable=fetch_dynamic_context
)
```

### 5. MultiAgentSystem.__init__()

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `agentManager` | `AgentManager` | **Yes** | N/A | AgentManager instance with created agents |
| `supervisor_config` | `SupervisorConfig` | No | `None` | Supervisor configuration (required for hierarchical MAS) |
| `heirarchical_mas_result_callback` | `Callable` | No | `None` | Callback function for task completion in hierarchical MAS |
| `agent_return_direct` | `bool` | No | `False` | If `False`, supervisor evaluates agent responses before returning |

**Example**:
```python
from masai.MultiAgents.MultiAgent import MultiAgentSystem, SupervisorConfig

supervisor_config = SupervisorConfig(
    model_name="gpt-4",
    temperature=0.2,
    model_category="openai",
    memory_order=20,
    memory=True,
    extra_context={"organization": "MyCompany"},
    supervisor_system_prompt="You are an efficient task coordinator"
)

mas = MultiAgentSystem(
    agentManager=manager,
    supervisor_config=supervisor_config,
    heirarchical_mas_result_callback=handle_result,
    agent_return_direct=False
)
```

### 6. SupervisorConfig

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `model_name` | `str` | **Yes** | N/A | LLM model name for supervisor |
| `temperature` | `float` | **Yes** | N/A | Temperature (0.0-1.0) |
| `model_category` | `str` | **Yes** | N/A | Model category |
| `memory_order` | `int` | **Yes** | N/A | Supervisor memory size |
| `memory` | `bool` | **Yes** | N/A | Enable supervisor memory |
| `extra_context` | `dict` | **Yes** | N/A | Additional context for supervisor |
| `supervisor_system_prompt` | `str` | No | `None` | Custom system prompt (uses default if not provided) |

### 7. InMemoryDocStore.__init__()

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `documents` | `List[Union[str, Document]]` | No | `None` | Initial documents to store |
| `ids` | `List[str]` | No | `None` | Document IDs (auto-generated if not provided) |
| `embedding_model` | `Union[str, Callable, object]` | No | `"all-MiniLM-L6-v2"` | Embedding model: string (SentenceTransformer name), callable, or object with `embed_documents` method |

> **⚠️ IMPORTANT: Heavy Dependencies Not Included**
>
> MASAI does **NOT** include `sentence-transformers`, `torch`, or `transformers` in its core dependencies to keep the framework lightweight (~50MB vs ~2GB+).
>
> **To use SentenceTransformer embeddings**, install separately:
> ```bash
> pip install sentence-transformers
> ```
> This will download ~2GB+ of dependencies (torch, transformers, etc.).
>
> **Alternatives (no heavy dependencies needed)**:
> - Use custom embedding function (Option 2 below)
> - Use LangChain embeddings like OpenAI (Option 3 below)
> - Use `embedding_model=None` for keyword-based search only

**Example**:
```python
from masai.Memory.InMemoryStore import InMemoryDocStore

# Option 1: SentenceTransformer model name (requires: pip install sentence-transformers)
store = InMemoryDocStore(embedding_model="all-MiniLM-L6-v2")

# Option 2: Custom embedding function (no heavy dependencies)
def my_embedder(texts: List[str]) -> np.ndarray:
    # Your embedding logic (e.g., call OpenAI API, use lightweight model, etc.)
    return embeddings

store = InMemoryDocStore(embedding_model=my_embedder)

# Option 3: LangChain embeddings (no heavy dependencies if using API-based embeddings)
from langchain.embeddings import OpenAIEmbeddings
store = InMemoryDocStore(embedding_model=OpenAIEmbeddings())

# Option 4: No embeddings (keyword-based search only)
store = InMemoryDocStore(embedding_model=None)
```

### 8. ToolCache.__init__()

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `host` | `str` | No | `"localhost"` | Redis server host |
| `port` | `int` | No | `6379` | Redis server port |
| `db` | `int` | No | `0` | Redis database number |
| `password` | `str` | No | `None` | Redis password (if required) |
| `timeout` | `int` | No | `30` | Cache timeout in minutes |

**Example**:
```python
from masai.Tools.utilities.cache import ToolCache
from langchain.tools import tool

# Initialize cache
cache = ToolCache(
    host="localhost",
    port=6379,
    db=0,
    timeout=60  # 60 minutes
)

# Use as decorator
@tool
@cache.masai_cache
def expensive_api_call(query: str) -> dict:
    """Makes an expensive API call. Results are cached."""
    return api_response
```

### 9. OrchestratedMultiAgentNetwork.__init__()

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `mas_instances` | `List[MultiAgentSystem]` | **Yes** | N/A | List of MAS instances to orchestrate |
| `network_memory_order` | `int` | No | `3` | Network-level memory size |
| `oman_llm_config` | `dict` | No | `None` | OMAN supervisor LLM configuration |
| `extra_context` | `dict` | No | `None` | Additional context for OMAN supervisor |

**oman_llm_config Structure**:
```python
oman_llm_config = {
    "model_name": "gemini-2.0-flash-001",
    "category": "gemini",
    "temperature": 0.2,
    "memory_order": 3
}
```

**Example**:
```python
from masai.OMAN.oman import OrchestratedMultiAgentNetwork

# Create multiple MAS instances
mas1 = MultiAgentSystem(agentManager=manager1)
mas2 = MultiAgentSystem(agentManager=manager2)

# Create OMAN
oman = OrchestratedMultiAgentNetwork(
    mas_instances=[mas1, mas2],
    network_memory_order=5,
    oman_llm_config={
        "model_name": "gpt-4",
        "category": "openai",
        "temperature": 0.2,
        "memory_order": 5
    },
    extra_context={"environment": "production"}
)
```

---

## Use Cases & Best Practices

### When to Use Each Architecture

| Architecture | Use Case | Example |
|--------------|----------|---------|
| **Router-Evaluator-Reflector** | Dynamic routing, tool-heavy tasks | Database queries, API integrations, data retrieval |
| **Planner-Executor-Reflector** | Complex multi-step tasks | Research projects, data analysis, report generation |

### When to Use Each Workflow

| Workflow | Use Case | Example |
|----------|----------|---------|
| **Sequential** | Fixed pipeline, deterministic flow | ETL, document processing, data transformation |
| **Hierarchical** | Quality control, supervision needed | Research with review, complex analysis, content creation |
| **Decentralized** | Adaptive collaboration, peer tasks | Personal assistant systems, multi-domain problem solving |

### Memory Configuration Guidelines

| Scenario | memory_order | long_context | long_context_order | in_memory_store |
|----------|--------------|--------------|-------------------|-----------------|
| **Short conversations** | 5 | False | N/A | No |
| **Medium conversations** | 10 | True | 10 | No |
| **Long conversations** | 10 | True | 20 | Yes |
| **Research tasks** | 5 | True | 30 | Yes |

### Model Selection Guidelines

| Component | Recommended Model | Reasoning |
|-----------|------------------|-----------|
| **Router** | Fast model (GPT-5-nano, Gemini Flash, GPT-4o-mini) | Frequent calls, simple routing |
| **Evaluator** | Fast model (GPT-5-mini, Gemini Flash) | Binary decision (satisfied/not) |
| **Reflector** | Strong model (GPT-5, GPT-4o, Gemini Pro) | Complex reasoning and insights |
| **Planner** | Strong model (GPT-5, GPT-4o, Gemini Pro) | Complex task decomposition |
| **Supervisor** | Strong model (GPT-5, GPT-4o, Gemini Pro) | High-level coordination |

### GPT-5 Support

**MAS-AI natively supports GPT-5 reasoning models** released by OpenAI in August 2025. GPT-5 models offer superior reasoning capabilities but require different parameters than GPT-4.

**Available Models:**
- `gpt-5`: Full reasoning model (best quality)
- `gpt-5-mini`: Balanced speed/quality
- `gpt-5-nano`: Fastest, most cost-effective
- `gpt-5-thinking`: Extended reasoning with visible thought process

**Key Differences:**
- ❌ GPT-5 does NOT support `temperature`, `top_p`, `presence_penalty`, `frequency_penalty`
- ✅ GPT-5 uses `reasoning_effort` instead: `"minimal"`, `"medium"`, `"high"`
- ✅ MASAI automatically maps temperature to reasoning_effort (no code changes needed)

**Configuration Example:**
```json
{
    "all": {
        "router": {"model_name": "gpt-5-nano", "category": "openai"},
        "evaluator": {"model_name": "gpt-5-mini", "category": "openai"},
        "reflector": {"model_name": "gpt-5", "category": "openai"},
        "planner": {"model_name": "gpt-5-mini", "category": "openai"}
    }
}
```

**Temperature Mapping:**
- Temperature 0.0-0.3 → `reasoning_effort: "minimal"` (fast, deterministic)
- Temperature 0.4-0.7 → `reasoning_effort: "medium"` (balanced, default)
- Temperature 0.8-1.0 → `reasoning_effort: "high"` (deep reasoning)

📖 **See [docs/GPT5_INTEGRATION_GUIDE.md](docs/GPT5_INTEGRATION_GUIDE.md) for detailed GPT-5 integration guide**

---

## Advanced Features

### 1. Using BaseGenerativeModel (Without Agent Architecture)

For simple conversational AI without tools or routing, use `BaseGenerativeModel`:

**Use Case**: Simple chatbots, Q&A systems, content generation

**Configuration**:
```python
from masai.GenerativeModel.baseGenerativeModel.basegenerativeModel import BaseGenerativeModel

model = BaseGenerativeModel(
    model_name="gemini-2.0-flash",
    category='gemini',
    temperature=1,
    memory=True,
    memory_order=20,
    info={'USER_NAME': 'John', 'CONTEXT': 'Customer support'},
    system_prompt="You are a helpful assistant"
)

# Streaming response
async for chunk in model.astream_response(prompt):
    print(chunk, end='', flush=True)

# Non-streaming response
response = await model.generate_response(prompt)
print(response)
```

**Key Features**:
- No tools, no routing - pure LLM interaction
- Simple memory management
- Streaming and non-streaming support
- Custom system prompts
- Lightweight and fast

**When to Use**:
- Simple Q&A without external data
- Content generation tasks
- Conversational interfaces without tool needs
- Prototyping before adding agent architecture

### 2. Dynamic `return_direct` for Tools

Control whether tool outputs are returned directly to the user or passed through the evaluation pipeline. Supports three levels of control with priority: **Return Value > Parameter > Decorator**.

#### ⚠️ Reserved Parameter in MASAI System

**IMPORTANT:** `return_direct` is a **reserved parameter** in the MASAI framework. The framework automatically:
- ✅ Extracts `return_direct` from tool inputs and return values
- ✅ Handles the routing logic (skip evaluation vs. go through pipeline)
- ✅ Removes `return_direct` from the actual data passed to the tool
- ✅ Implements the three-level priority system

**What this means for tool developers:**
- 🔹 You can add `return_direct` as a parameter to your tool function
- 🔹 You can return `{"data": ..., "return_direct": True}` from your tool
- 🔹 The framework will automatically handle the behavior
- 🔹 You don't need to implement any special logic in your tool

#### Three Levels of Control

**1. Decorator Level (Static)**
```python
from langchain.tools import tool

@tool("Database Query", return_direct=True)
async def query_database(query: str) -> str:
    """Execute database query. Results always returned directly."""
    return db.execute(query)
```

**2. Parameter Level (Dynamic)**
```python
@tool("Smart Query", return_direct=False)
async def smart_query(
    query: str,
    return_direct: bool = False  # LLM can control this
) -> str:
    """
    Execute query with optional direct return.

    Args:
        query: SQL query to execute
        return_direct: If True, return results directly without evaluation
    """
    return db.execute(query)
```

**3. Return Value Level (Runtime)**
```python
@tool("Adaptive Query", return_direct=False)
async def adaptive_query(query: str) -> dict:
    """Tool decides internally based on result complexity."""
    results = db.execute(query)

    # Tool decides based on result size
    if len(results) < 10:
        # Simple result - return directly
        return {
            "data": results,
            "return_direct": True  # Tool decides to skip evaluation
        }
    else:
        # Complex result - needs evaluation
        return {
            "data": results,
            "return_direct": False  # Tool decides to evaluate
        }
```

#### Priority Hierarchy

The framework implements a **strict priority hierarchy** for `return_direct`:

```
┌─────────────────────────────────────────────────────────────┐
│              return_direct PRIORITY HIERARCHY               │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. RETURN VALUE (Highest Priority)                        │
│     └─ Tool returns: {"data": ..., "return_direct": True}  │
│        ✅ Overrides everything                              │
│                                                             │
│  2. PARAMETER (Medium Priority)                            │
│     └─ LLM passes: tool_input = {"return_direct": True}    │
│        ✅ Overrides decorator                               │
│                                                             │
│  3. DECORATOR (Lowest Priority)                            │
│     └─ @tool("name", return_direct=True)                   │
│        ✅ Used if no higher priority set                    │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

**Priority Matrix:**

| Decorator | Parameter | Return Value | Effective | Winner | Reason |
|-----------|-----------|--------------|-----------|--------|--------|
| `True` | Not set | Not set | `True` | Decorator | Only decorator specified |
| `False` | `True` | Not set | `True` | **Parameter** | Parameter overrides decorator |
| `True` | `False` | Not set | `False` | **Parameter** | Parameter overrides decorator |
| `True` | `True` | `False` | `False` | **Return Value** | Return value has highest priority |
| `False` | `False` | `True` | `True` | **Return Value** | Return value has highest priority |
| `True` | Not set | `False` | `False` | **Return Value** | Return value overrides decorator |

#### Use Cases

**Database Queries**: User wants raw data vs. analysis
```python
# User: "Get all active users, return raw data"
# LLM calls: smart_query(query="...", return_direct=True)
# → Returns JSON directly

# User: "Analyze active users"
# LLM calls: smart_query(query="...", return_direct=False)
# → Goes through evaluator for analysis
```

**File Operations**: Content vs. metadata
```python
@tool("File Reader", return_direct=False)
async def read_file(path: str, return_direct: bool = False) -> str:
    """Read file with optional direct return."""
    content = open(path).read()
    return content

# User: "Show me config.json"
# → return_direct=True, shows file directly

# User: "Analyze config.json"
# → return_direct=False, evaluator analyzes
```

#### How MASAI Handles `return_direct`

The framework automatically processes `return_direct` in the `execute_tool` node:

```python
# Framework logic (you don't need to implement this)
# 1. Extract return_direct from three sources
tool_decorator_return_direct = hasattr(tool, 'return_direct') and tool.return_direct
input_return_direct = tool_input.get('return_direct') if isinstance(tool_input, dict) else None
result_return_direct = result.get('return_direct') if isinstance(result, dict) else None

# 2. Implement priority: Return Value > Parameter > Decorator
if result_return_direct is not None:
    should_return_direct = result_return_direct
    source = "return value (runtime decision)"
elif input_return_direct is not None:
    should_return_direct = input_return_direct
    source = "input argument"
else:
    should_return_direct = tool_decorator_return_direct
    source = "decorator"

# 3. If return_direct=True, skip evaluation and go to END
if should_return_direct:
    state['answer'] = tool_output
    state['satisfied'] = True
    state['current_tool'] = None
    # Workflow goes directly to END
```

**Benefits**:
- ✅ **Flexibility**: Single tool handles both direct and evaluated outputs
- ✅ **LLM Control**: LLM decides based on user intent
- ✅ **Efficiency**: Skip unnecessary evaluation for simple queries
- ✅ **Smart Tools**: Tools can decide internally based on result complexity
- ✅ **Automatic**: Framework handles all the routing logic
- ✅ **Reserved**: `return_direct` is automatically extracted and processed

For detailed documentation, see [docs/DYNAMIC_RETURN_DIRECT.md](docs/DYNAMIC_RETURN_DIRECT.md) and [docs/RETURN_DIRECT_PRIORITY_SYSTEM.md](docs/RETURN_DIRECT_PRIORITY_SYSTEM.md)

---

### 3. Redis Caching for Tools

Cache tool outputs to improve performance for frequently used queries:

**Setup**:
```python
from langchain.tools import tool
from masai.Tools.utilities.cache import ToolCache

# Initialize Redis cache
redis_cache = ToolCache(host='localhost', port=6379, db=0)

@tool
@redis_cache.masai_cache
def expensive_api_call(query: str) -> str:
    """
    Makes an expensive API call. Results are cached.

    Args:
        query: Search query

    Returns:
        API response
    """
    # Expensive operation here
    return api_response
```

**Benefits**:
- Faster response times for repeated queries
- Reduced API costs
- Automatic cache invalidation
- Monitor cache with Redis CLI

**Requirements**:
- Redis server running
- `redis` Python package installed

---

### 4. In-Memory Vector Store

For long conversations with semantic search capabilities:

**Setup**:
```python
from masai.Memory.InMemoryStore import InMemoryDocStore

# Create vector store with embedding model
memory_store = InMemoryDocStore(embedding_model="all-MiniLM-L6-v2")

# Create agent with vector store
manager.create_agent(
    agent_name="research_agent",
    tools=tools,
    agent_details=details,
    long_context=True,
    long_context_order=20,
    in_memory_store=memory_store,
    top_k=3  # Retrieve top 3 relevant memories
)
```

**How It Works**:
1. Old messages (beyond `memory_order`) are summarized
2. Summaries are embedded and stored in vector store
3. On new queries, semantic search retrieves relevant past context
4. Retrieved context is added to prompt

**Benefits**:
- Semantic search over conversation history
- Better context retention for long conversations
- Efficient memory usage

**Supported Embedding Models**:
- Sentence Transformers (default: `all-MiniLM-L6-v2`)
- LangChain embedding models (OpenAI, Cohere, etc.)

---

### 5. Streaming Callbacks

For real-time response streaming:

**Setup**:
```python
async def streaming_callback(chunk):
    """Handle streaming chunks"""
    if isinstance(chunk, dict):
        if "answer" in chunk:
            print(f"Answer: {chunk['answer']}")
    else:
        print(chunk, end="", flush=True)

manager = AgentManager(
    context={},
    logging=True,
    model_config_path="model_config.json",
    streaming=True,
    streaming_callback=streaming_callback
)
```

**Use Cases**:
- Real-time UI updates
- Progress indicators
- Debugging and monitoring

---

### 6. Per-Component Configuration

Override model settings for specific components:

**Setup**:
```python
config_dict = {
    "router": {
        "memory_order": 10,
        "temperature": 0.3
    },
    "evaluator": {
        "memory_order": 5,
        "temperature": 0.1
    },
    "reflector": {
        "memory_order": 15,
        "temperature": 0.7
    }
}

manager.create_agent(
    agent_name="custom_agent",
    tools=tools,
    agent_details=details,
    config_dict=config_dict
)
```

**Benefits**:
- Fine-tune each component independently
- Optimize for specific use cases
- Balance cost vs. performance

---

### 7. Context Callable (Dynamic Context)

Fetch dynamic context on each query:

**Setup**:
```python
async def get_user_context(query: str) -> dict:
    """Fetch user-specific context dynamically"""
    # Fetch from database, API, etc.
    user_data = await fetch_user_data()
    return {
        "user_name": user_data["name"],
        "user_role": user_data["role"],
        "permissions": user_data["permissions"]
    }

manager.create_agent(
    agent_name="personalized_agent",
    tools=tools,
    agent_details=details,
    context_callable=get_user_context
)
```

**Use Cases**:
- User-specific personalization
- Real-time data fetching
- Dynamic permission checks

**Note**: Currently only called for user queries, not agent-to-agent delegation.

**Context Callable Flow**:
```
┌─────────────────────────────────────────────────────────────┐
│           CONTEXT CALLABLE WORKFLOW                         │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  User Query Received                                        │
│    ↓                                                        │
│  [context_callable set?]                                    │
│    ↓ Yes                                                    │
│  [role == "user"?]                                          │
│    ↓ Yes                                                    │
│  Call context_callable(query)                               │
│    ↓                                                        │
│  [Is coroutine?]                                            │
│    ↓ Yes              ↓ No                                  │
│  await callable()   callable()                              │
│    ↓                  ↓                                     │
│  [Result is dict?]                                          │
│    ↓ Yes              ↓ No                                  │
│  Update info dict   Add as 'USEFUL DATA' key                │
│    ↓                                                        │
│  Include in prompt under <INFO>                             │
│    ↓                                                        │
│  After LLM response:                                        │
│    ↓                                                        │
│  Remove 'USEFUL DATA' from info                             │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

### 7. TaskManager (Hierarchical MAS Internals)

The TaskManager handles concurrent task execution in hierarchical MAS:

**Key Features**:
- **Concurrent Execution**: Uses ThreadPoolExecutor for parallel task processing
- **Task Queue**: Manages pending tasks with unique IDs
- **Result Callbacks**: Async callbacks for task completion
- **Completed Task Context**: Maintains history of completed tasks for supervisor context

**Configuration**:
```python
# TaskManager is automatically created by MultiAgentSystem
# You can configure it through SupervisorConfig

supervisor_config = SupervisorConfig(
    model_name="gpt-4",
    temperature=0.2,
    model_category="openai",
    memory_order=20,  # Supervisor memory
    memory=True,
    extra_context={"organization": "MyCompany"},
    supervisor_system_prompt="Custom supervisor prompt"
)

mas = MultiAgentSystem(
    agentManager=manager,
    supervisor_config=supervisor_config,
    heirarchical_mas_result_callback=my_callback,
    agent_return_direct=False  # Supervisor reviews agent output
)
```

**TaskManager Workflow**:
```
┌─────────────────────────────────────────────────────────────┐
│              TASKMANAGER WORKFLOW                           │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Query → Supervisor                                         │
│    ↓                                                        │
│  Supervisor Decision:                                       │
│    ├─ Direct Answer → Return                                │
│    └─ Delegate to Agent                                     │
│         ↓                                                   │
│  Create Task (unique ID)                                    │
│    ↓                                                        │
│  Add to pending_tasks{}                                     │
│    ↓                                                        │
│  Submit to ThreadPoolExecutor                               │
│    ↓                                                        │
│  Agent Executes (async)                                     │
│    ↓                                                        │
│  Result → Result Queue                                      │
│    ↓                                                        │
│  Listener Task picks up result                              │
│    ↓                                                        │
│  [agent_return_direct=True?]                                │
│    ↓ Yes              ↓ No                                  │
│  Return result    Supervisor reviews                        │
│                       ↓                                     │
│                   [Satisfied?]                              │
│                     ↓ No                                    │
│                   Request revision                          │
│                     ↓ Yes                                   │
│  Add to completed_tasks[]                                   │
│    ↓                                                        │
│  Call result_callback (if set)                              │
│    ↓                                                        │
│  Return final result                                        │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

**Parameters**:
- `max_thread_workers`: Maximum concurrent tasks (default: 5)
- `check_interval`: Interval for checking completed tasks (default: 1.0s)
- `_last_n_completed_tasks`: Number of completed tasks to keep in context (default: 10)

### 8. OMAN (Orchestrated Multi-Agent Network) Detailed

OMAN coordinates multiple MAS instances, each specializing in different domains:

**Architecture**:
```
┌─────────────────────────────────────────────────────────────┐
│                    OMAN ARCHITECTURE                        │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌──────────────────────────────────────────────────────┐  │
│  │           OMAN SUPERVISOR                            │  │
│  │  - Routes queries to appropriate MAS                 │  │
│  │  - Maintains network-level memory                    │  │
│  │  - Coordinates cross-MAS communication               │  │
│  └──────────────────────────────────────────────────────┘  │
│                          │                                  │
│         ┌────────────────┼────────────────┐                │
│         ↓                ↓                ↓                │
│  ┌──────────┐     ┌──────────┐     ┌──────────┐          │
│  │  MAS 1   │     │  MAS 2   │     │  MAS 3   │          │
│  │ (Finance)│     │(Research)│     │(Customer)│          │
│  │          │     │          │     │ Support  │          │
│  │ Agent A  │     │ Agent D  │     │ Agent G  │          │
│  │ Agent B  │     │ Agent E  │     │ Agent H  │          │
│  │ Agent C  │     │ Agent F  │     │ Agent I  │          │
│  └──────────┘     └──────────┘     └──────────┘          │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

**Setup Example**:
```python
from masai.OMAN.oman import OrchestratedMultiAgentNetwork
from masai.MultiAgents.MultiAgent import MultiAgentSystem

# Create specialized MAS instances
finance_mas = MultiAgentSystem(agentManager=finance_manager)
research_mas = MultiAgentSystem(agentManager=research_manager)
support_mas = MultiAgentSystem(agentManager=support_manager)

# Create OMAN
oman = OrchestratedMultiAgentNetwork(
    mas_instances=[finance_mas, research_mas, support_mas],
    network_memory_order=5,
    oman_llm_config={
        "model_name": "gpt-4",
        "category": "openai",
        "temperature": 0.2,
        "memory_order": 5
    },
    extra_context={"environment": "production", "company": "MyCompany"}
)

# Use OMAN
result = oman.delegate_task("Analyze Q4 financial performance")
```

**OMAN Routing Process**:
1. OMAN supervisor receives query
2. Analyzes query against each MAS's agent capabilities
3. Selects most appropriate MAS based on specialization
4. Delegates query to selected MAS
5. MAS processes query using its agents
6. Result returned through OMAN supervisor
7. Network memory updated with task outcome

**Use Cases**:
- **Enterprise Systems**: Multiple departments with specialized agents
- **Multi-Domain Applications**: Finance + Research + Support in one system
- **Scalable Architecture**: Add new MAS instances without modifying existing ones

---

## Troubleshooting & Best Practices

### 1. Tool Loop Prevention

**Problem**: Agent repeatedly calls the same tool without making progress

**Causes**:
- Tool returns insufficient information
- LLM doesn't recognize task completion
- Tool errors not properly handled

**Solutions**:
```python
# 1. Increase max_tool_loops
from masai.Config import config
config.max_tool_loops = 5  # Default: 3

# 2. Improve tool descriptions
@tool
def search_database(query: str) -> dict:
    """
    Searches database for records.

    Args:
        query: Search query string

    Returns:
        dict with 'count' (int) and 'results' (list) keys.
        Returns empty list if no results found.
    """
    # Implementation

# 3. Add error handling in tools
@tool
def api_call(endpoint: str) -> dict:
    """Makes API call with error handling"""
    try:
        response = requests.get(endpoint)
        response.raise_for_status()
        return {"success": True, "data": response.json()}
    except Exception as e:
        return {"success": False, "error": str(e)}
```

### 2. Reflection Counter Management

**Problem**: Agent gets stuck in reflection loops

**Solution**:
```python
# Adjust reflection limit
config.MAX_REFLECTION_COUNT = 5  # Default: 3

# Improve reflector temperature for more decisive answers
config_dict = {
    "reflector_temperature": 0.3  # Lower = more deterministic
}
```

### 3. Memory Optimization Strategies

**Strategy 1: Short Conversations**
```python
manager.create_agent(
    agent_name="quick_agent",
    tools=tools,
    agent_details=details,
    memory_order=5,
    long_context=False  # No summarization needed
)
```

**Strategy 2: Long Conversations with Summarization**
```python
manager.create_agent(
    agent_name="research_agent",
    tools=tools,
    agent_details=details,
    memory_order=10,
    long_context=True,
    long_context_order=30  # Keep 30 summaries
)
```

**Strategy 3: Very Long Conversations with Vector Store**
```python
from masai.Memory.InMemoryStore import InMemoryDocStore

memory_store = InMemoryDocStore(embedding_model="all-MiniLM-L6-v2")

manager.create_agent(
    agent_name="longterm_agent",
    tools=tools,
    agent_details=details,
    memory_order=10,
    long_context=True,
    long_context_order=20,
    in_memory_store=memory_store,
    top_k=5  # Retrieve 5 most relevant memories
)
```

### 4. When to Use long_context vs LTIMStore

| Feature | long_context | LTIMStore |
|---------|--------------|-----------|
| **Purpose** | Sequential conversation history | Semantic search over old conversations |
| **Storage** | List of summaries | Vector embeddings |
| **Retrieval** | All recent summaries | Top-k most relevant |
| **Best For** | Conversations with clear progression | Research tasks with topic jumps |
| **Overhead** | Low (summarization only) | Medium (embedding computation) |
| **Use When** | Conversation length < 100 messages | Conversation length > 100 messages |

### 5. Performance Tuning

**Reduce Latency**:
```python
# Use faster models for frequent operations
config_dict = {
    "router_model_name": "gemini-2.0-flash",  # Fast routing
    "evaluator_model_name": "gemini-2.0-flash",  # Fast evaluation
    "reflector_model_name": "gemini-pro"  # Quality reflection
}

# Reduce memory_order for faster context processing
memory_order=5  # Instead of 20

# Enable Redis caching for expensive tools
cache = ToolCache(timeout=60)
@tool
@cache.masai_cache
def expensive_tool(query: str) -> dict:
    # Implementation
```

**Reduce Costs**:
```python
# Use cheaper models where possible
config_dict = {
    "router_model_name": "gpt-3.5-turbo",
    "evaluator_model_name": "gpt-3.5-turbo",
    "reflector_model_name": "gpt-4"  # Only use expensive model where needed
}

# Reduce memory_order to minimize token usage
memory_order=3
long_context_order=10
```

### 6. Error Handling Patterns

**Pattern 1: Tool Error Handling**
```python
@tool
def robust_tool(query: str) -> dict:
    """Tool with comprehensive error handling"""
    try:
        result = perform_operation(query)
        return {"success": True, "data": result}
    except ValueError as e:
        return {"success": False, "error": f"Invalid input: {e}"}
    except ConnectionError as e:
        return {"success": False, "error": f"Connection failed: {e}"}
    except Exception as e:
        return {"success": False, "error": f"Unexpected error: {e}"}
```

**Pattern 2: Agent Error Handling**
```python
try:
    agent = manager.get_agent("my_agent")
    result = await agent.initiate_agent("Query")

    if "error" in result:
        # Handle agent-level errors
        logger.error(f"Agent error: {result['error']}")
    else:
        # Process successful result
        print(result['answer'])

except ValueError as e:
    # Handle agent not found
    logger.error(f"Agent not found: {e}")
except Exception as e:
    # Handle unexpected errors
    logger.error(f"Unexpected error: {e}")
```

---

## Summary

MAS-AI provides a comprehensive framework for building intelligent multi-agent systems with:

✅ **Modular Architecture**: Router, Evaluator, Reflector, Planner components
✅ **Flexible Memory**: 5-level hierarchy from short-term to vector store
✅ **Multiple Workflows**: Sequential, Hierarchical, Decentralized
✅ **LLM Support**: Native OpenAI and Google Gemini support via vanilla SDK wrappers
✅ **Advanced Features**: Redis caching, streaming, dynamic context, per-component config
✅ **Scalable**: From single agents to OMAN networks

**Next Steps**:
1. Read [MASAI_PROMPT_TEMPLATES_AND_DATA_FLOW.md](./MASAI_PROMPT_TEMPLATES_AND_DATA_FLOW.md) for prompt details
2. Review [README.md](./README.md) for quick start examples
3. Check [MASAI_CONTEXT_MANAGEMENT_ANALYSIS.md](./MASAI_CONTEXT_MANAGEMENT_ANALYSIS.md) for context deep dive

---

## Installation & Setup

### Prerequisites

- Python 3.9 or higher
- pip package manager

### Installation Options

#### Option 1: Install from PyPI (Recommended)

```bash
pip install masai-framework
```

> **📦 Lightweight Installation**
>
> MASAI core installation is **~50MB** and includes only essential dependencies:
> - LangChain core packages
> - OpenAI and Google Gemini SDKs (vanilla wrappers)
> - Basic utilities (pydantic, requests, numpy, etc.)
>
> **Heavy ML dependencies are NOT included** to keep the framework lightweight:
> - ❌ `sentence-transformers` (~500MB)
> - ❌ `torch` (~1.5GB)
> - ❌ `transformers` (~500MB)
>
> **Install these separately only if needed**:
> ```bash
> # For InMemoryDocStore with SentenceTransformer embeddings
> pip install sentence-transformers
>
> # For tool-specific dependencies (install as needed)
> pip install beautifulsoup4 PyPDF2 duckduckgo-search wikipedia arxiv
> ```

#### Option 2: Install from Source

```bash
# Clone the repository
git clone https://github.com/shaunthecomputerscientist/mas-ai.git
cd mas-ai

# Install in development mode
pip install -e .

# Or install directly
pip install .
```

### Quick Start

1. **Create a model configuration file** (`model_config.json`):

```json
{
  "router": {
    "model_name": "gpt-4",
    "temperature": 0.2,
    "model_category": "openai"
  },
  "evaluator": {
    "model_name": "gpt-3.5-turbo",
    "temperature": 0.1,
    "model_category": "openai"
  },
  "reflector": {
    "model_name": "gpt-4",
    "temperature": 0.7,
    "model_category": "openai"
  }
}
```

2. **Set up environment variables**:

```bash
# For OpenAI
export OPENAI_API_KEY="your-api-key"

# For Gemini
export GOOGLE_API_KEY="your-api-key"
```

3. **Create your first agent**:

```python
from masai.AgentManager.AgentManager import AgentManager, AgentDetails
from masai.Tools.tools.baseTools import human_in_loop_input

# Initialize AgentManager
manager = AgentManager(
    context={"user_name": "Your Name"},
    logging=True,
    model_config_path="model_config.json"
)

# Define agent capabilities
agent_details = AgentDetails(
    capabilities=["reasoning", "analysis", "problem-solving"],
    description="A helpful AI assistant",
    style="friendly and informative"
)

# Create an agent
manager.create_agent(
    agent_name="assistant",
    tools=[human_in_loop_input],
    agent_details=agent_details
)

# Use the agent
response = await manager.get_agent("assistant").initiate_agent("Hello, how can you help me?")
print(response["answer"])
```

### Verification

Test your installation:

```python
# Test custom LangGraph implementation
from masai.langgraph.graph import StateGraph, END, START
from masai.langgraph.graph.state import CompiledStateGraph

print("✅ MASAI Framework installed successfully!")
print("✅ Custom LangGraph implementation ready!")
```

### Troubleshooting

**Common Issues:**

1. **Import Errors**: Make sure you're using Python 3.9+ and have installed all dependencies
2. **API Key Issues**: Verify your API keys are set correctly in environment variables
3. **Model Configuration**: Ensure your `model_config.json` file is properly formatted

**Getting Help:**

- 📖 Check the [documentation](https://github.com/shaunthecomputerscientist/mas-ai)
- 🐛 Report issues on [GitHub Issues](https://github.com/shaunthecomputerscientist/mas-ai/issues)
- 💬 Join our community discussions

---

**Last Updated**: 2025-01-05
**Framework Version**: v0.1.33 with Custom LangGraph + Visualization
**Documentation Version**: 2.2 (Complete Custom LangGraph with Graph Visualization)
