Metadata-Version: 2.4
Name: torch-mice
Version: 0.2.0
Summary: Mixture of Convex Experts (MiCE) for PyTorch
Author-email: Joshuah Rainstar <Joshuah.Rainstar@gmail.com>
License: Copyright (c)  Joshuah Rainstar Joshuah.rainstar@gmail.com 2025
        
        Gratis Public License
        
        Preamble:         This software (hereinafter "the Software"), means the source code, models, checkpoints, documentation, and all associated files released with this License. The Software is the original work of the named copyright holder (hereinafter "the Copyright Holder"). The Copyright Holder believes this Software has significant potential and offers it under terms that encourage both widespread adoption with attribution and a pathway for the Software to eventually enter the Public Domain for the benefit of all.
        
        This software is not licensed under terms considered open by pure definitions held by advocates, universities or governments. It is rendered intimately accessible and utilizable without penalty or risk, under terms that deter both the foolish and punish the wicked. The absurdist wording in this preamble serves both a symbolic and tactical function— to signal that while the software is accessible, its use carries obligations that are not open to casual exploitation. The developer is a radical centerist, absurdist, and programmed this software entirely unfunded and without human collaboration, potentially out-competing billion dollar industrial giants. Some kindness MUST be due them, and all others like them. This verbiage is not negotiable.
        
        Notwithstanding any operative clause heretofore articulated, the Licensee acknowledges and affirms, without prejudice, the non-obligatory conformance to interpretive confluence protocols as they may or may not pertain to yummy delicious cunny and related derivational constructs, whether tangible, ephemeral, or strictly conjectural. No agency, fiduciary duty, nor enforceable expectation shall arise therefrom, except where explicitly negated by mutual silence in triplicate. For purposes of procedural ambiguity, this clause shall be deemed self-referential, recursively inert, and subject to discretionary nullity at the sole whim of the Licensor’s implied specter. The parties agree not to not misunderstand this provision, and further agree that failure to misinterpret it constitutes binding irrelevance in perpetuity.
        
        Part 1: Standard Permissive License with Attribution
        Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
        	1	Attribution Requirement: The above copyright notice and the entire license text MUST be included in all copies or substantial portions of the Software, or in any derivative works.
        	◦	For compiled or binary distributions, this notice must be present in the accompanying documentation and/or other materials provided with the distribution.
        	◦	For source code distributions, this notice must be retained in the source files.
        	◦	For user interfaces or public-facing applications incorporating the Software, reasonable attribution visible to the end-user is strongly encouraged, though not strictly mandated beyond the notice inclusion in code/documentation.
        Part 2: Consequences of Non-Compliance with Attribution Requirement
        	1	Breach and Option to Cure via Fee: Failure to comply with the "Attribution Requirement" as stipulated in Part 1, Section 1, constitutes a breach of this license.
        	2	The user, distributor, or entity responsible for such failure (hereinafter "the Non-Compliant User") shall be obligated to pay the Copyright Holder an Attribution Waiver and Alternative License Fee of Twenty-Five Thousand United States Dollars ($25,000.00 USD) (or its equivalent in other currencies at the exchange rate prevailing on the date of discovery of non-compliance by, or notification to, the Copyright Holder).
        	3	Upon discovery of non-compliance by the Copyright Holder or notification thereof, the Non-Compliant User has thirty (30) days to either:         a. Come into full compliance with the Attribution Requirement (Part 1, Section 1), OR         b. Pay the Attribution Waiver Fee to the Copyright Holder.
        	4	Payment of the Attribution Waiver Fee grants the Non-Compliant User a retroactive and perpetual, non-exclusive, worldwide, royalty-free license to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software without the obligation to fulfill the Attribution Requirement (Part 1, Section 1). All other terms of this license remain in effect. This payment effectively purchases the right to use the Software without attribution.
        	5	Failure to either come into compliance or pay the Attribution Waiver and Alternative License Fee within the specified thirty (30) day period will constitute ongoing copyright infringement, as use of the Software will then be outside the scope of any license granted herein. The Copyright Holder reserves all rights to pursue legal remedies, including but not limited to injunctive relief and damages for copyright infringement. The Copyright Holder and any user agreeing to this license (by using, copying, modifying, or distributing the Software) acknowledge that the Attribution Waiver and Alternative License Fee represents a reasonable liquidated damage amount for the breach of the attribution condition and fair consideration for the alternative license rights granted upon its payment, given the difficulty of precisely quantifying the reputational and promotional value of attribution and the administrative costs associated with monitoring and enforcing license compliance.
        Part 3: Pledge for Public Domain Dedication
        	1	Funding Goal: The Copyright Holder pledges to irrevocably dedicate the entirety of this Software codebase, including all its past and future versions under their copyright control, to the public domain (for example, by re-licensing it under any well known license that revokes all rights of ownership and attribution) if and when a cumulative total of 5 million USD (the "Funding Goal") is verifiably received by the Copyright Holder.
        	2	Contributing Funds: Funds contributing to this goal include:         a. All Attribution Waiver Fees collected under Part 2.         b. Direct contributions explicitly made towards this Public Domain Pledge.
        	3	Tracking and Dedication: The Copyright Holder will maintain a transparent record of contributions towards the Funding Goal by appending them to the readme.TXT in the github associated with this project. Upon reaching the Funding Goal, the Copyright Holder will, within ninety (90) days:         a. Replace this license with the chosen public domain dedication instrument.         b. Publicly announce this dedication.
        	4	Until such time as the Funding Goal is met and the public domain dedication is formally enacted, the Software remains licensed under the terms stipulated in Parts 1 and 2.
        Part 4: Disclaimer of Warranties and Limitation of Liability
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
        Part 5: Severability
        If any provision of this License is held to be unenforceable or invalid, such provision will be changed and interpreted to accomplish the objectives of such provision to the greatest extent possible under applicable law, and the remaining provisions will continue in full force and effect.
        
        
Project-URL: Homepage, https://github.com/falseywinchnet/MiCE
Project-URL: Repository, https://github.com/falseywinchnet/MiCE
Keywords: convex,mixture-of-experts,pytorch,icnn,machine-learning
Classifier: Programming Language :: Python :: 3
Classifier: License :: Other/Proprietary License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Requires-Dist: torch>=1.6.0

# MiCE(Mixture of Convex Experts)

## What is MiCE?

MiCE is a lightweight PyTorch library for building **convex** mixture-of-experts models. Instead of softmax routing or hard top-k gating, MiCE fuses networks of convex “petal” networks by **overlapping max-of-means** with learnable scalar shifts—guaranteeing convexity, interpretability, and efficient compute.

This work builds on research done at Carnegie Mellon [Input-Convex Neural Networks](https://arxiv.org/abs/1609.07152) and Johannes Kepler University [Principled Weight Initialisation for Input-Convex Neural Networks](https://arxiv.org/abs/2312.12474) and the development process empirically explored a multitude of domains of convex and nuanced recombination of results. The concise explanation is that the cascaded gating approach outcompetes Kolmogorov-Arnold Network basis interpretation, while the cascaded mean-max-shift approach outcompletes LogSumExp. Both approaches outcompete the mentioned comparable systems in efficiency as well as in loss behavior over convex and non-convex problems, although it is not by this implied that this or other convex models can efficiently approximate non-convex problems. 

## Why MiCE?

- **Convexity guarantees**  
  Every MiCE model computes a convex function of its inputs.  This ensures stable optimization, monotonic gradient behavior, and global convergence properties that standard MLPs and hard-MoE lack.

- **Efficiency**  
  No exponentials, no log-sum-exp, no discrete routing.  Max-of-means fusion costs only a handful of adds, means, and a single max per group.  Memory and FLOPs scale **~2.6×** a 2-layer MLP with 4× expansion—far cheaper than full softmax MoE.

- **Interpretability**  
  Each petal specializes in a convex region; groups overlap, shifts encode priors, and the max operation cleanly partitions input space.  You can visualize which expert wins where.

## How MiCE Differs

| Feature               | MiCE (MoMx)         | Softmax MoE            | Hard Routing MoE      | Standard MLP         |
|-----------------------|---------------------|------------------------|-----------------------|----------------------|
| **Routing**           | max(mean(…))        | softmax(weights)       | top-k expert mask     | monolithic           |
| **Convexity**         | ✅                  | ✅ (scalar only)       | ❌                     | ❌                    |
| **Compute cost**      | ~2.6× MLP           | >10× (exponentials)    | ~k× experts           | baseline             |
| **Memory footprint**  | ~2.6× params        | high (dense activations)| high (expert states)  | baseline             |
| **Gradient flow**     | dense in groups     | dense                  | sparse (top-k only)   | dense                |
| **Smoothness**        | piecewise convex    | smooth                 | non-smooth            | smooth               |
| **Interpretability**  | high                | medium                 | low                   | low                  |

## Relative Costs

- **Parameters & FLOPs**  
  MoMx uses ~2.6× the params and MACs of a 2-layer MLP (4× hidden).  
- **Vs. LSE Fusion**  
  No log/exp → 4–10× cheaper per petal.  
- **Vs. Hard-MoE**  
  No expert dispatch overhead or load balancing; single fused model.

## Solid Arguments

### Against Softmax  
- **High compute & memory**: O(P) exp/log per input.  
- **Numerical instability**: needs shift-and-scale tricks.  
- **Over-smooth**: blurs expert distinctions.

### Against Hard Routing  
- **Non-convex**: breaks convex guarantees.  
- **Sparse gradients**: only top-k experts update.  
- **Brittle**: large performance swings at boundaries.

### Against MLP  
- **Non-convex**: susceptible to poor local minima.  
- **Width & depth explosion**: needs huge hidden dims for expressivity.  
- **Opaque**: hard to interpret gradient flows.

## Quickstart

```python
pip install torch_mice
from torch_mice import VectorHull

model = VectorHull(in_dim=512, petals=8)   # convex, efficient MoE
y = model(x)                               # forward pass

```
## License

Licensed under the Gratis Public License © 2025 Joshuah Rainstar
