Skip to content

Add neural network evaluation framework#52

Open
luccabb wants to merge 1 commit intomasterfrom
improve/nn-evaluation-framework
Open

Add neural network evaluation framework#52
luccabb wants to merge 1 commit intomasterfrom
improve/nn-evaluation-framework

Conversation

@luccabb
Copy link
Owner

@luccabb luccabb commented Feb 16, 2026

Summary

Introduce a pluggable evaluation architecture that makes it easy to use neural networks, LLMs, or any custom model for position assessment:

  • Evaluator protocol (evaluation/base.py): Defines the interface any evaluator must implement
  • ClassicalEvaluator (evaluation/classical.py): Wraps existing PeSTO evaluation
  • NNEvaluator (evaluation/nn.py): Framework supporting ONNX Runtime, PyTorch, and custom callables
  • NNEngine (engines/nn_engine.py): Alpha-beta search with pluggable evaluation
  • Board encoding: 773-feature vector (12 bitboard planes + metadata)

Usage examples

# ONNX model
moonfish --algorithm nn --nn-model-path model.onnx

# Custom LLM evaluator
evaluator = NNEvaluator(eval_fn=my_llm_function)
engine = NNEngine(config, evaluator=evaluator)

# Custom board encoder
evaluator = NNEvaluator(eval_fn=fn, board_encoder=my_encoder)

# Subclass for full control
class MyEval(NNEvaluator):
    def _raw_evaluate(self, board):
        return my_model(board)

Architecture

Evaluator (Protocol)
├── ClassicalEvaluator  ← PeSTO tables (existing behavior)
└── NNEvaluator         ← Neural network / LLM
    ├── from_file()     ← Load ONNX or PyTorch model
    ├── eval_fn         ← Custom callable (e.g., LLM API)
    └── _raw_evaluate() ← Override point for subclasses

Test plan

  • All alpha_beta unit tests pass (16/16)
  • NN engine works with custom eval function
  • NN engine works via factory (get_engine)
  • Board encoding produces correct 773-feature vectors
  • Classical evaluator produces identical results to direct PeSTO call

Introduce a pluggable evaluation architecture that supports multiple
model backends for position assessment:

- Evaluator protocol (evaluation/base.py): Defines the interface any
  evaluator must implement (evaluate + reset methods)
- ClassicalEvaluator (evaluation/classical.py): Wraps existing PeSTO
  evaluation as an Evaluator implementation
- NNEvaluator (evaluation/nn.py): Framework for neural network models
  supporting ONNX Runtime, PyTorch, and custom callables (e.g., LLMs)
- NNEngine (engines/nn_engine.py): Alpha-beta search engine with a
  pluggable evaluator instead of hardcoded PeSTO
- Board encoding: 773-feature vector (12 bitboard planes + metadata)

Usage examples:
  # ONNX model
  moonfish --algorithm nn --nn-model-path model.onnx

  # Custom LLM evaluator (Python API)
  evaluator = NNEvaluator(eval_fn=my_llm_function)
  engine = NNEngine(config, evaluator=evaluator)

  # Subclass for custom logic
  class MyEval(NNEvaluator):
      def _raw_evaluate(self, board): ...
@github-actions
Copy link

Benchmarks

The following benchmarks are available for this PR:

Command Description
/run-nps-benchmark NPS speed benchmark (depth 5, 48 positions)
/run-stockfish-benchmark Stockfish strength benchmark (300 games)

Post a comment with the command to trigger a benchmark run.

@greptile-apps
Copy link

greptile-apps bot commented Feb 16, 2026

Greptile Summary

This PR introduces a pluggable evaluation architecture allowing neural networks, LLMs, or custom models to be used for position assessment alongside the existing PeSTO evaluation.

  • Adds an Evaluator protocol in evaluation/base.py defining the interface for all evaluators
  • Wraps the existing PeSTO evaluation in a ClassicalEvaluator class for use as the default fallback
  • Implements NNEvaluator with ONNX Runtime, PyTorch, and custom callable support, including a 773-feature board encoding
  • Adds NNEngine that subclasses AlphaBeta and overrides eval_board to delegate to the pluggable evaluator
  • Wires the new nn algorithm through the CLI (--algorithm nn --nn-model-path) and factory function
  • The Syzygy exception handling in NNEngine.eval_board uses a broader except Exception compared to the parent class's specific (MissingTableError, KeyError), which could mask real errors

Confidence Score: 4/5

  • This PR is safe to merge with one minor issue in the Syzygy exception handling that should be addressed.
  • The architecture is clean and follows existing patterns well. The one concrete issue is the overly broad exception handler in NNEngine.eval_board which diverges from the parent class behavior. All other files are straightforward and well-structured. No breaking changes to existing functionality.
  • Pay close attention to moonfish/engines/nn_engine.py — the Syzygy exception handling differs from the parent class.

Important Files Changed

Filename Overview
moonfish/engines/nn_engine.py New engine subclass overriding eval_board with pluggable evaluator. The Syzygy exception handling is broader than the parent class, which could mask real errors.
moonfish/evaluation/init.py Clean package init exporting the three evaluation components. No issues.
moonfish/evaluation/base.py Well-defined Evaluator protocol with evaluate() and reset() methods. Clean and follows existing project patterns.
moonfish/evaluation/classical.py Thin wrapper around existing PeSTO evaluation. Correctly delegates to board_evaluation and clears the global cache on reset.
moonfish/evaluation/nn.py Core NN evaluator with ONNX/PyTorch/callable backends. Module-level docstring mentions en passant and halfmove clock metadata but the implementation only encodes side-to-move and castling rights.
moonfish/helper.py Factory function extended with nn algorithm support. Clean integration with lazy import of NNEvaluator.
moonfish/main.py Added --nn-model-path CLI option and wired it through to Config. Straightforward addition.
moonfish/config.py Added nn_model_path optional field with None default. No issues.

Class Diagram

classDiagram
    class Evaluator {
        <<Protocol>>
        +evaluate(board: Board) float
        +reset() None
    }
    class ClassicalEvaluator {
        +evaluate(board: Board) float
        +reset() None
    }
    class NNEvaluator {
        -_eval_fn: Callable
        -_board_encoder: Callable
        -_model
        -_backend: str
        +from_file(model_path: str) NNEvaluator
        +evaluate(board: Board) float
        +reset() None
        -_raw_evaluate(board: Board) float
        -_load_model(model_path: str) None
        -_load_onnx(model_path: str) None
        -_load_pytorch(model_path: str) None
    }
    class ChessEngine {
        <<Protocol>>
        +search_move(board: Board) Move
    }
    class AlphaBeta {
        +eval_board(board: Board) float
        +search_move(board: Board) Move
        +negamax() tuple
        +quiescence_search() float
    }
    class NNEngine {
        +evaluator: Evaluator
        +eval_board(board: Board) float
        +search_move(board: Board) Move
    }
    Evaluator <|.. ClassicalEvaluator
    Evaluator <|.. NNEvaluator
    ChessEngine <|.. AlphaBeta
    AlphaBeta <|-- NNEngine
    NNEngine --> Evaluator : uses
Loading

Last reviewed commit: fb66718

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +78 to +79
except Exception:
pass
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overly broad exception handler
The base AlphaBeta.eval_board catches (chess.syzygy.MissingTableError, KeyError) specifically, but here a bare except Exception is used. This could silently swallow real errors (I/O failures, corrupted tablebase data, etc.) and fall through to the NN evaluator without any indication that something went wrong. Consider matching the parent class's specific exception types.

Suggested change
except Exception:
pass
except (chess.syzygy.MissingTableError, KeyError):
pass

Comment on lines +26 to +27
- Metadata: side to move, castling rights, en passant, halfmove clock
- Total input size: 12*64 + 5 = 773 floats
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module docstring doesn't match implementation
The module-level docstring states "Metadata: side to move, castling rights, en passant, halfmove clock" and "Total input size: 12*64 + 5 = 773 floats", but the actual board_to_tensor implementation only encodes side-to-move (1 feature) and castling rights (4 features) — no en passant square and no halfmove clock. The function-level docstring is correct, but this module header could mislead users who rely on it to understand the encoding format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant