Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ engine.simplify(('/', '<constant>', '*', '/', '*', 'x3', '<constant>', 'x3', 'lo

# Simplify infix expressions
engine.simplify('x3 * sin(<constant> + 1) / (x3 * x3)')
# > '(<constant> / x3)'
# > '<constant> / x3'
```

More examples can be found in the [documentation](https://simplipy.readthedocs.io/).
Expand Down Expand Up @@ -88,7 +88,7 @@ pytest tests --cov src --cov-report html -m "not integration"
title = {Efficient Simplification of Mathematical Expressions},
year = 2025,
publisher = {GitHub},
version = {0.2.8},
version = {0.2.9},
url = {https://github.com/psaegert/simplipy}
}
```
105 changes: 102 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,103 @@
# Home
# SimpliPy Documentation

This page is under construction.
Check out the [API Reference](api.md) in the meantime.
SimpliPy is a high-throughput symbolic simplifier built for workloads where
classic tools like SymPy struggle—think millions of expressions in the pre-training of
Flash-ANSR's prefix-based transformer models. Instead of converting tokens into
heavyweight objects and back again, SimpliPy keeps expressions as lightweight
prefix lists, enabling rapid rewriting and direct integration with machine
learning pipelines.


## Why SimpliPy Exists

SymPy excels at exact algebra, but its object graph and string parsing introduce
costs that dominate at scale. SimpliPy was created to remove those bottlenecks:

- **Prefix-first representation** – Expressions stay as token lists the entire
time, so there's no repeated parsing or AST allocation.
- **Deterministic pipelines** – Rule application, operand sorting, and literal
masking always produce the same layout, which keeps downstream caches warm.
- **GPU-friendly integration** – Outputs map directly into Flash-ANSR's input
space without any conversion step, making it practical to simplify millions of
candidates per minute.


## Simplification Pipeline (Pseudo-Algorithm)

```text
function simplify(expr, max_iter=5):
tokens = parse(expr) # infix→prefix or validate existing prefix
tokens = normalize(tokens) # power folding, unary handling

for _ in range(max_iter):
tokens = cancel_terms(tokens) # additive/multiplicative multiplicities
tokens = apply_rules(tokens) # compiled rewrite patterns
tokens = sort_operands(tokens) # canonical order for commutative ops
tokens = mask_literals(tokens) # collapse trivial numerics to <constant>

if converged(tokens):
break

return finalize(tokens) # prefix list or infix string, caller’s choice
```

This loop is intentionally lightweight: each pass performs a handful of pure
list transformations, giving you predictable performance even on nested or noisy
expressions.


## Key Components

- **Parsing & normalization** – `SimpliPyEngine.parse` and
`convert_expression` convert infix input, harmonize power operators, and
propagate unary negation without losing prefix fidelity.
- **Term cancellation** – `collect_multiplicities` and `cancel_terms` identify
subtrees that appear with opposite parity or redundant factors, pruning them
before any rules run.
- **Rule execution** – `compile_rules` turns machine-discovered or human-authored
simplifications into tree patterns. `apply_simplifcation_rules` then performs
fast top-down matching in each iteration.
- **Canonical ordering** – `sort_operands` imposes a stable ordering for
commutative operators, ensuring identical expressions share identical token
layouts.
- **Rule discovery workflow** – `find_rules` explores expression space in
parallel worker processes, confirms identities with numeric sampling, and
writes back deduplicated rulesets that future engines can load instantly.


## Quickstart

```bash
pip install simplipy
```

```python
import simplipy as sp

engine = sp.SimpliPyEngine.load("dev_7-3", install=True)

# Simplify prefix expressions
engine.simplify(['/', '<constant>', '*', '/', '*', 'x3', '<constant>', 'x3', 'log', 'x3'])
# -> ['/', '<constant>', 'log', 'x3']

# Simplify infix expressions
engine.simplify('x3 * sin(<constant> + 1) / (x3 * x3)')
# -> '<constant> / x3'
```

Available engines can be browsed and downloaded from Hugging Face.
The SimpliPy Asset Manager handles listing, installing, and uninstalling assets:

```python
sp.list_assets("engine")
# --- Available Assets ---
# - dev_7-3 [installed] Development engine 7-3 for mathematical expression simplification.
# - dev_7-2 Development engine 7-2 for mathematical expression simplification.
```

## Where to go next

- Explore the [API reference](api.md) for function-level details.
- Read the [rule authoring guide](rules.md) to build simplification rule sets.

Happy simplifying!
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ authors = [
readme = "README.md"
requires-python = ">=3.11"
dynamic = ["dependencies"]
version = "0.2.8"
version = "0.2.9"
license = "MIT"
license-files = ["LICEN[CS]E*"]

Expand Down
75 changes: 56 additions & 19 deletions src/simplipy/engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ class SimpliPyEngine:
A compiled version of explicit rules without pattern variables.
"""
def __init__(self, operators: dict[str, dict[str, Any]], rules: list[tuple] | None = None) -> None:
# This part, which sets up all the operator properties, is unchanged.
# Cache operator metadata for quick access during parsing and evaluation.
self.operator_tokens = list(operators.keys())
self.operator_aliases = {alias: operator for operator, properties in operators.items() for alias in properties['alias']}
self.operator_inverses = {k: v["inverse"] for k, v in operators.items() if v.get("inverse") is not None}
Expand Down Expand Up @@ -105,16 +105,14 @@ def __init__(self, operators: dict[str, dict[str, Any]], rules: list[tuple] | No
self.connection_classes_hyper = {'add': "mult", 'mult': "pow"}
self.binary_connectable_operators = {'+', '-', '*', '/'}

# This is the simplified rules handling logic.
# It no longer checks if `rules` is a string or performs any file I/O.
# It only accepts a list of rules or None.
# Normalize the incoming rule list and eliminate duplicate patterns.
dummy_variables = [f'x{i}' for i in range(100)]
if rules is None:
self.simplification_rules = []
else:
self.simplification_rules = deduplicate_rules(rules, dummy_variables=dummy_variables)

# This part is also unchanged.
# Build the compiled lookup tables that power rule application.
self.compile_rules()
self.rule_application_statistics: defaultdict[tuple, int] = defaultdict(int)

Expand Down Expand Up @@ -268,7 +266,31 @@ def is_valid(self, prefix_expression: list[str], verbose: bool = False) -> bool:
return True

def prefix_to_infix(self, tokens: list[str], power: Literal['func', '**'] = 'func', realization: bool = False) -> str:
"""Converts a prefix expression to a human-readable infix string with minimal parentheses."""
"""Converts a prefix expression to an infix string with minimal parentheses.

Parameters
----------
tokens : list[str]
The prefix expression to render.
power : {'func', '**'}, optional
Controls how power operators are emitted. ``'func'`` keeps canonical
engine names such as ``pow3(x)``, while ``'**'`` renders Python-style
exponentiation.
realization : bool, optional
If True, operator tokens are replaced with their runtime
realizations (for example, ``'sin'`` becomes ``'np.sin'``), so the
output can be compiled directly.

Returns
-------
str
The formatted infix expression.

Raises
------
ValueError
If the provided tokens do not form a well-formed prefix expression.
"""

if not tokens:
return ''
Expand Down Expand Up @@ -688,7 +710,9 @@ def parse(
"""Parses an infix string into a standardized prefix expression.

This is a high-level parsing utility that combines `infix_to_prefix`
with optional conversion and number masking steps.
with optional canonicalization and number masking. The resulting token
list is additionally cleaned up via `remove_pow1` to drop redundant
``pow1_1`` occurrences.

Parameters
----------
Expand All @@ -704,7 +728,8 @@ def parse(
Returns
-------
list[str]
The final processed prefix expression.
The processed prefix expression after conversion, masking (if
enabled), and `remove_pow1` cleanup.
"""

parsed_expression = self.infix_to_prefix(infix_expression)
Expand Down Expand Up @@ -1023,11 +1048,15 @@ def collect_multiplicities(self, expression: list[str] | tuple[str, ...], verbos
Returns
-------
expression_tree : list
The expression represented as a tree.
A stack-based representation of the expression tree. Each entry is a
nested list of the form ``[operator, operands]`` mirroring the
structure consumed by `cancel_terms`.
annotations_tree : list
A parallel tree containing the multiplicity counts for each subtree.
A parallel stack holding multiplicity annotations for each subtree,
organized by connection class.
labels_tree : list
A parallel tree containing unique identifiers for each subtree.
A parallel stack containing stable identifiers for every subtree,
used to detect duplicates during cancellation.
"""
stack: list = []
stack_annotations: list = []
Expand Down Expand Up @@ -1133,18 +1162,22 @@ def cancel_terms(self, expression_tree: list, expression_annotations_tree: list,
Parameters
----------
expression_tree : list
The nested list representation of the expression.
The stack produced by `collect_multiplicities`, containing the
nested expression structure.
expression_annotations_tree : list
The corresponding tree of multiplicity annotations.
The parallel stack of multiplicity annotations returned by
`collect_multiplicities`.
stack_labels : list
The corresponding tree of subtree labels.
The parallel stack of subtree labels returned by
`collect_multiplicities`.
verbose : bool, optional
If True, prints detailed debugging information. Defaults to False.

Returns
-------
list[str]
A new prefix expression with terms cancelled.
A simplified prefix expression with the detected duplicates merged
or removed.
"""
stack = expression_tree
stack_annotations = expression_annotations_tree
Expand Down Expand Up @@ -1637,11 +1670,14 @@ def find_rule_worker(
constants_fit_retries: int) -> None:
"""A worker process for discovering simplification rules in parallel.

This function runs in a separate process. It fetches an expression from
the `work_queue`, evaluates it on a set of random numerical data, and
This function runs in a separate process. It fetches work items of the
form ``(expression, simplified_length, allowed_candidate_lengths)`` from
`work_queue`, evaluates the expression on shared random data, and
compares the result against a library of simpler candidate expressions.
If a numerical equivalence is found, it is considered a potential new
simplification rule and is placed on the `result_queue`.
simplification rule and is placed on the `result_queue`; otherwise ``None``
is queued to signal that no rule was discovered. A sentinel ``None`` work
item triggers a graceful shutdown.

Notes
-----
Expand Down Expand Up @@ -1803,7 +1839,8 @@ def find_rules(
Equivalences are found by evaluating both expressions on random
numerical data.

Discovered rules are added to the engine and can be saved to a file.
Discovered rules are deduplicated, compiled into the running engine, and
can optionally be saved to disk.

Parameters
----------
Expand Down
Loading