Updating Precondition Language / Matching System for PyHTN (New Syntax + Speed)


# Updating Pattern Matching System for PyHTN (New Syntax + Speed)

PyHTN's pattern-matching capabilities need a tune-up. Currently, we're using Chris's py_plan repository to do unification-based matching. This has been relatively stable but is quite slow (in part because it is written in Python), and because the underlying unification-based matching approach has terrible asymptotic guarantees (it is essentially brute-force, trying every combination of things and seeing what sticks). In a few projects that rely on PyHTN, including Apprentice Tutors and VAL, I've heard from others or have personally encountered minor performance issues that will likely grow into problems as we scale up to more complex domains. I think it's time for an upgrade.

Currently, I am in the process of reimplementing a much faster algorithm called CORGI as a C++ Python extension. I developed CORGI originally as part of my dissertation work at CMU to tractably solve challenging worst-case matching problems with tight asymptotic guarantees. At the time, I needed something that wouldn't break when my AI agents induced complicated matching patterns as they learned. My previous implementation of CORGI (using [numba](https://numba.pydata.org/)) is faster than the shortlist of compiled systems I've benchmarked it against, including SOAR (which has put a lot of money into optimizing its pattern-matching capabilities). 

[CORGI paper:  [https://openreview.net/pdf?id=MQr2n8bxqU](https://openreview.net/pdf?id=MQr2n8bxqU)

<img width="354" height="234" alt="Image" src="https://github.com/user-attachments/assets/b2c96ef8-43dd-4f57-bcf0-71f8b400a59c" />

I'm rebuilding CORGI, along with a lot of other helpful tools in a project called Cognitive Rule Engine (CRE), and the syntax I'm proposing here (with any suggestions from you all) will be integrated into CRE and inherited in PyHTN by using CRE as a dependency. CRE is already on PyPI you can `pip install cre`, to get the version I wrote for my dissertation (which will soon be replaced). 
[Disseration repo in (numba): [https://github.com/DannyWeitekamp/Cognitive-Rule-Engine](https://github.com/DannyWeitekamp/Cognitive-Rule-Engine)] 
[Reimplementation in (C++) to replace the above: [https://github.com/DannyWeitekamp/CCRE](https://github.com/DannyWeitekamp/CCRE)] 

CRE is intended to have broader applicability in the Cognitive Systems community and perhaps also in the broader planning and robotics communities beyond the scope of PyHTN. But insofar as this project overlaps with PyHTN my goals are as follows: 
1. **Future-proof PyHTN** so that pattern matching performance is never an issue (even in worst-case scenarios). As a lab, we haven't hand-written any worst-case scenario patterns into HTNs that I know of yet, but they will inevitably crop up as we start learning HTNs inductively in a bottom-up manner and synthesize them from natural language. Additionally, faster pattern matching will allow us to scale seamlessly into domains with large numbers of objects, like bigger games and web automation tasks, where there can be hundreds of objects involved and lots of relationships between them (consider the DOM of this GitHub webpage, for instance).  
2. **Update the PyHTN precondition syntax**: I want to be mostly backwards compatible with the syntax that already exists, but introduce a new standard that is more concise, expressive, and generally Pythonic, in addition to being more flexible than the current PyHTN precondition syntax.

## PyHTN's Current Precondition Syntax

Below is an example taken from an apprentice tutor domain.
```python3
{...
'solve': Method(head=('solve', V('equation')),
        preconditions=[Fact(field=V('equation'), value=V('eq'), answer=False)],
        subtasks=[
                Task(head=('simplify_exp', V('equation'), ('multiply_values', 'simplify_exp'))),
                Task(head=('done', ('done',)))
            ],
    ),
...}
```
This example illustrates the general format of how variables (e.g., `V('equation')`) are propagated through a task head into a method body, and are incorporated into its preconditions and subtasks. It also shows a new variable `V('eq')` being bound to the `value` attribute of the same Fact object. So far, every literal (i.e., a predicate or its negation) in the precondition expression is a test of equality (or effectively an assignment with an implicit test of existence). 

Currently, if we want to use more complicated preconditions, we need to use `Filter()`, like for example, in this Operator for dice adventure:
```python3
for pin_type in ['pinga', 'pingb', 'pingc', 'pingd']:
...
      Operator(
          name=f"{pin_type}",
          args=(),
          preconditions=[
              Fact("actionPoints", V("ap")),
              Filter(lambda ap: ap > 0)
          ],
          effects=[]
      )
```
Here we bind `actionPoints` to a variable `V("ap")` and refer to it again in `Filter(lambda ap: ap > 0)`. This syntax is simultaneously a bit verbose---we need two expressions for a single condition literal `ap > 0`---and a bit ambiguous---it isn't entirely clear what thing, or kind of thing `actionPoints` belongs to.

## A New Standard Syntax

The syntax that I proposed for CRE/pyHTN is influenced in part by the underlying data structures used in my CORGI algorithm. This structure introduces a few changes relative to the PyHTN syntax:

1. CORGI Facts can have types. A FactType defines the member slots within a fact instance and their types.  This structure is essentially a [frame](https://en.wikipedia.org/wiki/Frame_(artificial_intelligence))). Strong typing enables type specialization optimizations that can speed up the matching process. 

```python3
Department = FactType("Department", {"city": str, "num" : int})
Employee = FactType("Employee", {"num": int, "home_city" : str, "dept_num" : int})
Project = FactType("Project", {"proj_num" : int, "emp_num" : int})
``` 
2. CRE can perform matching over a `FactSet`, which is just a collection of Fact instances like `Employee(num=1, home_city="Houston", dept_num=2)`. `FactSet`s and `Fact`s can be manipulated directly within Python, and CRE includes utilities for efficiently translating between Python `dict` instances and `Fact`s. There isn't much point in translating back to something like an object or dict, because CRE Facts are usually more efficient to manipulate than native Python objects. 

3. Logical expressions in CRE are expressed with respect to typed `Var` objects. For instance, a logical expression for the preconditions of a Method might look like:
```python3
....
preconditions=AND(
    D:=Var(Department), D.city == "Houston",
    E:=Var(Employee), E.home_city != D.city, 
    P:=Var(Project), E.num == P.emp_num,
    V1:=Var(Employee), V1.home_city != E.home_city, V1.num > E.num
)
...
```
In general, a `Var` has a type and an alias (i.e. a name) expressed as `Var(Type, "alias")` or `'alias:=Var(Type)`. In the latter case, `alias` is also a Python variable that contains the `Var` and can be used in subsequent expressions. Unlike PyHTN's current `py_plan` backend, a CRE `Var` typically targets particular Fact objects instead of primitive values like floats or strings. Only in certain edge cases, which can often be avoided, do primitive values need to be bound to named variables.  

4. Like in the example above, `Var` objects can be extended with the dot operator `.` to express dereferencing fact members. For instance, the value of `V1:=Var(Vehicle)` could be dereferenced and compared to a constant `V1.value > 20000`, or to another bound variable, let's say `H1:=Var(House)`, and compared like `V1.value > H1.value`. Note here that, unlike the PyHTN example, we don't need to first bind the member value to another named variable, and the built-in predicate `<` can be used freely in normal Python syntax without an explicit `Filter` or `lambda x,y: ...` function. 
 
5. Most of the arithmetic operations in Python are also built into CRE, and we can write arbitrary literal expressions. We can express relations between objects directly in Python syntax  `V1.value > H1.value / 100` or even express complex mathematical relationships `V1.value**2 <= ((H2.value / 100)-5000)**(1/2)`. This extended syntax does not handle all edge cases---we may still need to fall back on lambda functions in some cases---but cases that align with built-in Python arithmetic can be executed in this form with fast C++ compiled subroutines instead of Python ones.

#### Object-Oriented and Relatively Foolproof

More importantly, this convention is more object-oriented and consistent with regular Python syntax than the current PyHTN syntax. By avoiding intermediate variables for primitive values, we reduce some intermediate naming cruft. More importantly, this convention reduces some ambiguity by clarifying which `Fact` each primitive originally belonged to. This is useful for readability---we avoid ambiguity surrounding whether or not there is a distinction between variable assignment and equality checking (which can be a source of potential confusion for newcomers)---and useful for performance, since we are encouraged to bind to `Fact` instances with respect to their `FactType`. For example, this new syntax discourages the author from writing unconstrained patterns like `Fact(x=V("gx"), y=V("gy"))`, which might bind to anything with an x and y value. If we were matching within a game, such a pattern might bind to everything with a position. Instead, we might write `P:=Var(Player)`, and get `P.x` and `P.y` to just get the positions of players.   

#### Flexible First-Class Objects

Just like the original PyHTN syntax, the above syntax is valid Python. Thus, we maintain a lot of flexibility to write declarative logical expressions without the need to parse a special Domain-Specific Language (e.g., PDDL, SOAR, OPS5). Consequently, we can rearrange expressions freely and enjoy the benefits of treating CRE objects as [first-class citizens](https://en.wikipedia.org/wiki/First-class_citizen) . We can, for instance, lift  variable definitions into higher parts of a method definition, like into the head or args:

```python3
my_method = Method(head=('send_valentine', E:=Var(Employee)),
        args=(E, D:=Var(Department), P:=Var(Project), V1=Var(Employee)),
        preconditions=AND(
           D.city == "Houston", E.home_city != D.city, E.num == P.emp_num, 
           V1.home_city != E.home_city, V1.num > E.num
        )
        ...
    )
```
And freely compose and repurpose logical statements
```python3
conds = my_method.preconditions
V2 = Var(Employee)
conds &= AND(V2.home_city != E.home_city, V2.num != E.num, V2.num != V1.num)
```
To some extent, the current PyHTN precondition language has this quality as well, but CRE introduces a more Pythonic syntax.

#### Why Object-Oriented as Standard?

Note that this object-oriented notation is helpful for communicating logical expressions in CRE, because it aligns well with the data structures used in CORGI. In particular, this syntax is very explicit about the number and types of Fact-bound variables in each logical expression, and the particular literal expressions contained within them. It is easy to go from this format to a CORGI graph, like the one below, that solves matching problems efficiently and keeps itself up to date as working memory changes to speed up subsequent match cycles.

<img width="228" height="310" alt="Image" src="https://github.com/user-attachments/assets/5d3fbe6f-84ea-4a03-89ad-6212d2eeb183" />

## Backwards Compatability & Asthetic 

For purely aesthetic reasons, we might disagree with some or all of this new object-oriented syntax with its strong typing, dot `.` operators,  lack of named intermediate variables `V("equation")`, or just find `Var` overly verbose compared to a simple `V`. For the last concern is easy enough to `from cre import Var as V`. For the others, some relaxations on the above syntax can be accommodated along with the standard object-oriented syntax. 

### Retaining Fact Constructor Semantics
The following three expressions are functionally equivalent:
```python3
c1 = AND(P0:=Var(Person), P0.id == "bob", P0.money == 100.0) 
c2 = AND(Person(id='bob', money=100.0))
c3 = AND(P0:=Var(Person(id='bob', money=100.0)))
```
The second and third options start as instances of the `Person` Fact, and when they are passed to the `AND()` logical connective, they are turned into individual literals `P0.id == "bob", P0.money == 100.0`. However, for the purposes of printing any of these logical statements, they retain the semantics of how they were constructed, for instance:

```python3
 print(c1) # >> AND(P0:=Var(Person), P0.id == "bob", P0.money == 100.0) 
 print(c3) # >> AND(P0:=Var(Person(id='bob', money=100.0)))
```
We might prefer `c3` to the syntax of the more concise `c2` option if we needed to use `P0` later in another literal, for instance:
```python3
AND(
   P0:=Var(Person(id='bob', money=100.0)),
   N:=Var(Person), N.money >P0.money, N in P0.neighbors 
)
```

### Intermediate Variables 

I don't intend to eliminate intermediate variables entirely (I simply find that doing so aids readability and cuts out many kinds of potential user error).  Nonetheless, there are likely situations where one would still benefit from being able to write something like `Var('value')`, in multiple places in a logical expression, and have it freely bind to primitive values and contribute to arbitrary expressions. I hope to support this in CRE, but this functionality may take longer to make fully stable, since it has some extra edge with possibilities for undefined behavior that would be ideally caught gracefully before match time.  For instance, the following can be well defined:

```python3
AND(
     P0:=Var(Person(id='bob', money=Var("bob_money")),
     N:=Var(Person), N.money >Var("bob_money"), N in P0.neighbors
)
```
Because we can easily have the first line bind `Var("bob_money")` to `P0.money`.  Whereas the following is not well defined:

```python3
AND(
     P0:=Var(Person), P0.id='bob', money >= Var("min_money") ),
     N:=Var(Person), N.money >= Var("min_money"), N in P0.neighbors
)
```
Because `Var("min_money")` cannot be resolved to have a fixed value through an equality check. It may be sufficient to simply require that any use of  `Var("min_money")` in a non-equality-based literal is preceded by something like `Bind(Var("min_money"), 100.0)`.

### Untyped Facts
#### Untyped Members

Strict typing adds a lot of extra speed to CORGI. But, in the interest of being Pythonic, we may consider loosening this constraint. This, too, may be a feature that comes later or in pieces because it has a lot of edge cases. `FactType` definitions create templates for allocating Fact objects with a certain number and type of named attribute slots. In the Numba version of CRE, a `Fact` is a specialized struct-like object for each type, so a slot for a `str` member might have a larger byte-width than a slot for a `bool` member. In the C++ reimplementation of CRE, I've loosened this constraint so that each member of a fact simply fits into a fixed-size `16-byte` region and includes a small type flag. This allows CRE to be more loosely typed without any degradation in performance. It may allow, for instance, FactType definitions that don't have strict member types, and can hold any arbitrary value:

```python3
Department = FactType("Department", {"city": Undef, "num" : Undef})
```

#### Unconstrained Attribute Assignment

In other cases, we might want to instantiate Facts dynamically with a variety of members without worrying about predefined slots, or we might also want to add special members at runtime (similar to Python objects).  If we know the maximum name and number of members ahead of time, then we can just define the FactType to have the superset of attributes used, and only instantiate each fact with the subsets needed in each instance, in which case, the uninitialized values will take a special `Undef` singleton type. I've avoided using `None` for this purpose because there are cases where someone might want to explicitly have `None` of something as a distinct case from having not yet set any value at all. It would also be easy to define default values along with FactTypes (this would make them fully consistent with Minksy's notion of [frames](https://en.wikipedia.org/wiki/Frame_(artificial_intelligence))).

I intend to allow CRE's `Fact` objects to have an underlying data structure similar to `__dict__`  in a Python object, where we can add anything as a new member to the object as needed on top of predefined `__slots__`. This would allow CRE to be backward compatible with the current PyHTN syntax, where writing Fact(id='bob', money=100.0) creates an untyped object. Assigning arbitrary attributes dynamically would make typed facts act more like Python objects and permit additional assignment of attributes that are not part of their type definition. There would be some non-negligible additional performance overhead associated with using these dynamic attributes, but less overhead than regular Python objects add to the Python interpreter. Performance issues aside, I would still discourage the regular use of this feature by default in most cases. As with other features, this opens up many edge cases, so it may not become stable until later phases of development.  

## Other Features
### Tuples as Fluents

Many cognitive systems, planners, and robotics applications express domain states exclusively using grounded tuple-like predicates, for example, `(color, my_car, red)` instead of as frames with typed attributes like in the examples above. The C++ rewrite of CRE will support tuple-like predicates both as data and as literals in logical statements. They are implemented as immutable/frozen untyped Facts, and can be added to FactSets by adding a regular Python `tuple`. As literals in logical statements:

```python3
fs = FactSet([('on', 'box', 'table')])
any_on_table = AND(('on', Var('thing'), 'table'))
print(list(any_on_table.get_matches(fs))) # ['table']
```
In principle, tuples can be used in combination with regular frame-like facts as well:
```python3
 AND(B:=Var(Box), B.shape == "square",  ('on', B, 'table'))
```
Unlike the current precondition evaluation system in PyHTN, under the hood, CRE does not need to convert all facts into a common, tuple-based representation. Tuple-based literals like `('on', B, 'table')` are handled independently of function-based literals over frames like `A.value < B.value`, and the matching of each variety of literal uses an underlying implementation that is efficient for each. 

### Negation, Universal Quantification, and other Quirks

When pattern matching is applied to a logical statement, for instance: 
```python
fs = ...
conds = AND(P0 = Var(Person), P1 = Var(Person(father=P0)))
match_iter = fs.get_matches(conds)
print(next(match_iter)) # Match(P0 : Person(id="dad"), P1: Person(id='son', father=dad))
```
Each `Var` defines one thing that will appear in a valid match. In general, we can get the things bound to each variable by treating the `Match` either like a `tuple` or a `dict`. 
```python
(P0, P1) = match
P0 = match['P0'], P1 = match['P1']
```
In terms of logical quantifiers, a `Var` implies the existence $\exists$ of a Fact in a FactSet. It also implies that when `get_matches` is called, every consistent combination of matching objects is iterated over. A looser sense of $\exists$ can be specified using `Exists` in place of `Var`, indicating that some satisfying object for the variable must exist, but we don't need to iterate over all of them. We can also express non-existence $\nexists$ indicated by using `Not` in place of `Var`, and forall $\forall$ by using `All` in place of `Var`. There is also `Opt` for things that are bound to matches when they exist, but are not required to exist for there to be a valid match. 

 | Var Kind | Logical Quantifier | Iteration Type |
| :------- | :------ | :------- |
| Var     | $\exists$      | All  independant matches  |
| Exists    | $\exists$      | An iterator with each match  |
| Not    | $\nexists$    | None  |
| All      | $\forall$       | An iterator with each match  |
| Opt    | $\exists \vee \emptyset $ | All independent matches or `Undef` if they do not exist |
| *Bound   | N/A         | Any bound vlaue |

As a side note, `Not` and `All` conditions are very simple to check for in CORGI graphs, so there shouldn't be any performance surprises with the inclusion of these qualifiers. This is not true of older matching approaches like RETE. 

It may make sense to additionally have explicit `Arg` type `Var`s, to signal that a logical statement is only matchable if it is explicitly passed certain constant values as inputs. In principle, any Var should be fixable ahead of time, something which is useful for running HTNs that pass variable bindings from Tasks to sub-Tasks. In the example above, we might need to pass an argument to `get_matches`, for instance:

```python3
conds = AND(P0 = Var(Person), P1 = Var(Person(id=Var('name'), father=P0, )))
match_iter = fs.get_matches(conds, name="bob")
```
or 
```python3
conds = AND(P0 = Var(Person), P1 = Var(Person(father=P0)))
match_iter = fs.get_matches(conds, P1=bob)
```
In general, we could fix the value of any variable like this in advance through `get_matches`.

### Disjunction (OR)

Finally, in CRE (just the new C++ version), logical statements can consist of literals connected with AND() or OR() connectives or any nested combination of the two. 

```python
conds1 = AND(
  OR(
    A:=Var(Car), A.value >= 10000, 
    B:=Var(Car), B.value >= 20000, 
  ), 
  B.color == "blue", 
  C:=Var(Car), C.value < 5000, 
)
conds2 = OR(
  AND(
    A:=Var(Car), A.value >= 10000, 
    B:=Var(Car), B.value >= 20000, 
  ), 
  B.color == "blue",  
  C:=Var(Car), C.value < 5000, 
)
```
In conds1, the `A` in the `OR` expression is automatically promoted to `Opt` since `A.value >= 10000,` or `B.value >= 20000`, but not necessarily both, and since `A` is introduced within the `OR(...)` it doesn't necessarily need to exist if a `B` exists. However, `B` would remain a `Var` because there is a literal in the outer `AND` that requires `B`. 

In the Numba version of CRE, logical statements were reduced into disjunctive normal form by default. In the C++ rewrite, I don't intend to have this transformation occur automatically since it doesn't necessarily speed things up (it may even slow things down) and removes the original semantics of the logic as it was written. I will retain the ability to reduce `Logic` objects to their DNF as a callable method like `to_dnf()`.  

OR statements also have a helpful shortcut notation for cases where we want some value to take one of several possible values, we can write:

```python3
AND(P0:=Var(Person), P0.job==OR("nurse", "doctor", "surgeon"))
```

where under the hood `P0.job==OR("nurse", "doctor", "surgeon")` effectively translates to:
```python3
OR(P0.job == "nurse", P0.job == "doctor",  P0.job == "surgeon") 
```

## What I want from you!

I'm hoping to get some feedback on all of this before I implement it completely. Do these changes seem like ideal improvements? Do you like the syntax? Is there anything you would change? Is there something missing that you would like to see? 



     

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Updating Precondition Language / Matching System for PyHTN (New Syntax + Speed) #4

Updating Pattern Matching System for PyHTN (New Syntax + Speed)

PyHTN's Current Precondition Syntax

A New Standard Syntax

Object-Oriented and Relatively Foolproof

Flexible First-Class Objects

Why Object-Oriented as Standard?

Backwards Compatability & Asthetic

Retaining Fact Constructor Semantics

Intermediate Variables

Untyped Facts

Untyped Members

Unconstrained Attribute Assignment

Other Features

Tuples as Fluents

Negation, Universal Quantification, and other Quirks

Disjunction (OR)

What I want from you!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Var Kind	Logical Quantifier	Iteration Type
Var	$\exists$	All independant matches
Exists	$\exists$	An iterator with each match
Not	$\nexists$	None
All	$\forall$	An iterator with each match
Opt	$\exists \vee \emptyset $	All independent matches or `Undef` if they do not exist
*Bound	N/A	Any bound vlaue

Updating Precondition Language / Matching System for PyHTN (New Syntax + Speed) #4

Description

Updating Pattern Matching System for PyHTN (New Syntax + Speed)

PyHTN's Current Precondition Syntax

A New Standard Syntax

Object-Oriented and Relatively Foolproof

Flexible First-Class Objects

Why Object-Oriented as Standard?

Backwards Compatability & Asthetic

Retaining Fact Constructor Semantics

Intermediate Variables

Untyped Facts

Untyped Members

Unconstrained Attribute Assignment

Other Features

Tuples as Fluents

Negation, Universal Quantification, and other Quirks

Disjunction (OR)

What I want from you!

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions