Add neural network evaluation framework by luccabb · Pull Request #52 · luccabb/moonfish

luccabb · 2026-02-16T08:37:35Z

Summary

Introduce a pluggable evaluation architecture that makes it easy to use neural networks, LLMs, or any custom model for position assessment:

Evaluator protocol (evaluation/base.py): Defines the interface any evaluator must implement
ClassicalEvaluator (evaluation/classical.py): Wraps existing PeSTO evaluation
NNEvaluator (evaluation/nn.py): Framework supporting ONNX Runtime, PyTorch, and custom callables
NNEngine (engines/nn_engine.py): Alpha-beta search with pluggable evaluation
Board encoding: 773-feature vector (12 bitboard planes + metadata)

Usage examples

# ONNX model
moonfish --algorithm nn --nn-model-path model.onnx

# Custom LLM evaluator
evaluator = NNEvaluator(eval_fn=my_llm_function)
engine = NNEngine(config, evaluator=evaluator)

# Custom board encoder
evaluator = NNEvaluator(eval_fn=fn, board_encoder=my_encoder)

# Subclass for full control
class MyEval(NNEvaluator):
    def _raw_evaluate(self, board):
        return my_model(board)

Architecture

Evaluator (Protocol)
├── ClassicalEvaluator  ← PeSTO tables (existing behavior)
└── NNEvaluator         ← Neural network / LLM
    ├── from_file()     ← Load ONNX or PyTorch model
    ├── eval_fn         ← Custom callable (e.g., LLM API)
    └── _raw_evaluate() ← Override point for subclasses

Test plan

All alpha_beta unit tests pass (16/16)
NN engine works with custom eval function
NN engine works via factory (get_engine)
Board encoding produces correct 773-feature vectors
Classical evaluator produces identical results to direct PeSTO call

Introduce a pluggable evaluation architecture that supports multiple model backends for position assessment: - Evaluator protocol (evaluation/base.py): Defines the interface any evaluator must implement (evaluate + reset methods) - ClassicalEvaluator (evaluation/classical.py): Wraps existing PeSTO evaluation as an Evaluator implementation - NNEvaluator (evaluation/nn.py): Framework for neural network models supporting ONNX Runtime, PyTorch, and custom callables (e.g., LLMs) - NNEngine (engines/nn_engine.py): Alpha-beta search engine with a pluggable evaluator instead of hardcoded PeSTO - Board encoding: 773-feature vector (12 bitboard planes + metadata) Usage examples: # ONNX model moonfish --algorithm nn --nn-model-path model.onnx # Custom LLM evaluator (Python API) evaluator = NNEvaluator(eval_fn=my_llm_function) engine = NNEngine(config, evaluator=evaluator) # Subclass for custom logic class MyEval(NNEvaluator): def _raw_evaluate(self, board): ...

github-actions · 2026-02-16T08:37:43Z

Benchmarks

The following benchmarks are available for this PR:

Command	Description
`/run-nps-benchmark`	NPS speed benchmark (depth 5, 48 positions)
`/run-stockfish-benchmark`	Stockfish strength benchmark (300 games)

Post a comment with the command to trigger a benchmark run.

greptile-apps · 2026-02-16T08:40:11Z

Greptile Summary

This PR introduces a pluggable evaluation architecture allowing neural networks, LLMs, or custom models to be used for position assessment alongside the existing PeSTO evaluation.

Adds an Evaluator protocol in evaluation/base.py defining the interface for all evaluators
Wraps the existing PeSTO evaluation in a ClassicalEvaluator class for use as the default fallback
Implements NNEvaluator with ONNX Runtime, PyTorch, and custom callable support, including a 773-feature board encoding
Adds NNEngine that subclasses AlphaBeta and overrides eval_board to delegate to the pluggable evaluator
Wires the new nn algorithm through the CLI (--algorithm nn --nn-model-path) and factory function
The Syzygy exception handling in NNEngine.eval_board uses a broader except Exception compared to the parent class's specific (MissingTableError, KeyError), which could mask real errors

Confidence Score: 4/5

This PR is safe to merge with one minor issue in the Syzygy exception handling that should be addressed.
The architecture is clean and follows existing patterns well. The one concrete issue is the overly broad exception handler in NNEngine.eval_board which diverges from the parent class behavior. All other files are straightforward and well-structured. No breaking changes to existing functionality.
Pay close attention to moonfish/engines/nn_engine.py — the Syzygy exception handling differs from the parent class.

Important Files Changed

Filename	Overview
moonfish/engines/nn_engine.py	New engine subclass overriding eval_board with pluggable evaluator. The Syzygy exception handling is broader than the parent class, which could mask real errors.
moonfish/evaluation/init.py	Clean package init exporting the three evaluation components. No issues.
moonfish/evaluation/base.py	Well-defined Evaluator protocol with evaluate() and reset() methods. Clean and follows existing project patterns.
moonfish/evaluation/classical.py	Thin wrapper around existing PeSTO evaluation. Correctly delegates to board_evaluation and clears the global cache on reset.
moonfish/evaluation/nn.py	Core NN evaluator with ONNX/PyTorch/callable backends. Module-level docstring mentions en passant and halfmove clock metadata but the implementation only encodes side-to-move and castling rights.
moonfish/helper.py	Factory function extended with nn algorithm support. Clean integration with lazy import of NNEvaluator.
moonfish/main.py	Added --nn-model-path CLI option and wired it through to Config. Straightforward addition.
moonfish/config.py	Added nn_model_path optional field with None default. No issues.

Class Diagram

classDiagram
    class Evaluator {
        <<Protocol>>
        +evaluate(board: Board) float
        +reset() None
    }
    class ClassicalEvaluator {
        +evaluate(board: Board) float
        +reset() None
    }
    class NNEvaluator {
        -_eval_fn: Callable
        -_board_encoder: Callable
        -_model
        -_backend: str
        +from_file(model_path: str) NNEvaluator
        +evaluate(board: Board) float
        +reset() None
        -_raw_evaluate(board: Board) float
        -_load_model(model_path: str) None
        -_load_onnx(model_path: str) None
        -_load_pytorch(model_path: str) None
    }
    class ChessEngine {
        <<Protocol>>
        +search_move(board: Board) Move
    }
    class AlphaBeta {
        +eval_board(board: Board) float
        +search_move(board: Board) Move
        +negamax() tuple
        +quiescence_search() float
    }
    class NNEngine {
        +evaluator: Evaluator
        +eval_board(board: Board) float
        +search_move(board: Board) Move
    }
    Evaluator <|.. ClassicalEvaluator
    Evaluator <|.. NNEvaluator
    ChessEngine <|.. AlphaBeta
    AlphaBeta <|-- NNEngine
    NNEngine --> Evaluator : uses

_{Last reviewed commit: fb66718}

greptile-apps

_{8 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-16T08:40:14Z

moonfish/engines/nn_engine.py

+                except Exception:
+                    pass


Overly broad exception handler
The base AlphaBeta.eval_board catches (chess.syzygy.MissingTableError, KeyError) specifically, but here a bare except Exception is used. This could silently swallow real errors (I/O failures, corrupted tablebase data, etc.) and fall through to the NN evaluator without any indication that something went wrong. Consider matching the parent class's specific exception types.

Suggested change

except Exception:

pass

except (chess.syzygy.MissingTableError, KeyError):

pass

greptile-apps · 2026-02-16T08:40:16Z

moonfish/evaluation/nn.py

+- Metadata: side to move, castling rights, en passant, halfmove clock
+- Total input size: 12*64 + 5 = 773 floats


Module docstring doesn't match implementation
The module-level docstring states "Metadata: side to move, castling rights, en passant, halfmove clock" and "Total input size: 12*64 + 5 = 773 floats", but the actual board_to_tensor implementation only encodes side-to-move (1 feature) and castling rights (4 features) — no en passant square and no halfmove clock. The function-level docstring is correct, but this module header could mislead users who rely on it to understand the encoding format.

greptile-apps bot reviewed Feb 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add neural network evaluation framework#52

Add neural network evaluation framework#52
luccabb wants to merge 1 commit intomasterfrom
improve/nn-evaluation-framework

luccabb commented Feb 16, 2026

Uh oh!

github-actions bot commented Feb 16, 2026

Uh oh!

greptile-apps bot commented Feb 16, 2026

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 16, 2026

Uh oh!

greptile-apps bot Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		- Metadata: side to move, castling rights, en passant, halfmove clock
		- Total input size: 12*64 + 5 = 773 floats

Conversation

luccabb commented Feb 16, 2026

Summary

Usage examples

Architecture

Test plan

Uh oh!

github-actions bot commented Feb 16, 2026

Benchmarks

Uh oh!

greptile-apps bot commented Feb 16, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Class Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant