Metadata-Version: 2.4
Name: orca-graphlets
Version: 0.1.4
Summary: ORCA: Python wrapper for efficient graphlet counting
Author-email: Ole Petersen <peteole2707@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/yourusername/orca_v2
Project-URL: Repository, https://github.com/yourusername/orca_v2
Project-URL: Issues, https://github.com/yourusername/orca_v2/issues
Keywords: graph,graphlets,network,analysis,orca
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: C++
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: numpy>=1.24.4
Provides-Extra: test
Requires-Dist: pytest>=7.0.0; extra == "test"
Requires-Dist: pytest-cov>=4.0.0; extra == "test"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: isort>=5.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"

# ORCA: Python Wrapper for Efficient Graphlet Counting

[![PyPI version](https://badge.fury.io/py/orca-graphlets.svg)](https://badge.fury.io/py/orca-graphlets)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A Python implementation of the ORCA (ORbit Counting Algorithm) for efficient counting of graphlet orbits in networks. This is a pure Python port of the original C++ implementation by [thocevar/orca](https://github.com/thocevar/orca).

## What is ORCA?

ORCA is an efficient algorithm for counting graphlets in networks. It computes node-orbits and edge-orbits for 4-node and 5-node graphlets for each node in the network. Graphlets are small connected subgraphs that serve as fundamental building blocks for network analysis.

## Features

- **Pure Python implementation** - No external C++ dependencies required
- **Node and edge orbit counting** - Count orbits for both nodes and edges
- **4-node and 5-node graphlets** - Support for graphlets of size 4 and 5
- **NumPy integration** - Efficient array-based operations
- **Validated accuracy** - Test suite ensures outputs match the original C++ implementation
- **Easy to use** - Simple API with sensible defaults

## Installation

```bash
pip install orca-graphlets
```

## Quick Start

```python
import numpy as np
from orca import orca_nodes, orca_edges

# Define a simple graph as an edge list
edges = np.array([
    [0, 1],
    [1, 2],
    [2, 3],
    [3, 0],
    [1, 3]
])

# Count node orbits for 4-node graphlets
node_orbits = orca_nodes(edges, graphlet_size=4)
print("Node orbits shape:", node_orbits.shape)

# Count edge orbits for 4-node graphlets  
edge_orbits = orca_edges(edges, graphlet_size=4)
print("Edge orbits shape:", edge_orbits.shape)
```

## API Reference

### `orca_nodes(edge_list, num_nodes=None, graphlet_size=4, debug=False)`

Count node orbits for each node in the graph.

**Parameters:**

- `edge_list` (np.ndarray): Array of shape (E, 2) containing edges as pairs of node indices
- `num_nodes` (int, optional): Number of nodes in the graph. If None, inferred from edge_list
- `graphlet_size` (int): Size of graphlets to count (4 or 5)
- `debug` (bool): Enable debug output

**Returns:**

- `np.ndarray`: Array of shape (N, K) where N is the number of nodes and K is the number of orbit types for the given graphlet size

### `orca_edges(edge_list, num_nodes=None, graphlet_size=4, debug=False)`

Count edge orbits for each edge in the graph.

**Parameters:**

- Same as `orca_nodes`

**Returns:**

- `np.ndarray`: Array of shape (E, K) where E is the number of edges and K is the number of orbit types for the given graphlet size

## Graphlet Orbits

### 4-node graphlets

- **Node orbits**: 15 different orbit types (0-14)
- **Edge orbits**: 11 different orbit types (0-10)

### 5-node graphlets

- **Node orbits**: 73 different orbit types (0-72)
- **Edge orbits**: 58 different orbit types (0-57)

## Input Format

The edge list should be a NumPy array where:

- Each row represents an undirected edge
- Columns contain the node indices (0-based)
- Node indices should be integers from 0 to N-1 where N is the number of nodes

Example:

```python
# Triangle graph: nodes 0, 1, 2 fully connected
edges = np.array([
    [0, 1],
    [1, 2], 
    [2, 0]
])
```

## Examples

### Basic Usage

```python
import numpy as np
from orca import orca_nodes, orca_edges

# Create a small graph
edges = np.array([
    [0, 1],
    [1, 2],
    [2, 3],
    [0, 3]
])

# Count 4-node graphlet orbits for nodes
node_counts = orca_nodes(edges, graphlet_size=4)
print(f"Node 0 orbit counts: {node_counts[0]}")

# Count 5-node graphlet orbits for edges
edge_counts = orca_edges(edges, graphlet_size=5)
print(f"Edge (0,1) orbit counts: {edge_counts[0]}")
```

### Loading from File

```python
import numpy as np
from orca import orca_nodes

# Load graph from file (format: first line = "num_nodes num_edges", 
# following lines = "node1 node2")
def load_graph(filename):
    with open(filename, 'r') as f:
        lines = f.readlines()
        num_nodes, num_edges = map(int, lines[0].strip().split())
        edges = np.array([
            list(map(int, line.strip().split())) 
            for line in lines[1:]
        ])
    return edges, num_nodes

edges, num_nodes = load_graph('graph.in')
orbits = orca_nodes(edges, num_nodes=num_nodes, graphlet_size=4)
```

## Performance

This pure Python implementation prioritizes:

- **Correctness**: Exact same results as the original C++ version
- **Ease of use**: No compilation or external dependencies required
- **Maintainability**: Clean, readable Python code

For maximum performance on very large graphs, consider using the original C++ implementation.

## Testing

The package includes comprehensive tests that verify the outputs match the original C++ implementation:

```bash
pytest tests/
```

## Original Work

This is a Python port of the original ORCA algorithm:

- **Original repository**: [thocevar/orca](https://github.com/thocevar/orca)
- **Algorithm paper**: Hočevar, T., & Demšar, J. (2014). A combinatorial approach to graphlet counting. Bioinformatics, 30(4), 559-565.

## License

This project is licensed under the MIT License - see the LICENSE file for details.

The original ORCA algorithm is licensed under GPL-3.0.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## Citation

If you use this software in your research, please cite the original paper:

```bibtex
@article{hocevar2014combinatorial,
  title={A combinatorial approach to graphlet counting},
  author={Ho{\v{c}}evar, Toma{\v{z}} and Dem{\v{s}}ar, Janez},
  journal={Bioinformatics},
  volume={30},
  number={4},
  pages={559--565},
  year={2014},
  publisher={Oxford University Press}
}
```
