Skip to content

Conversation

@cemde
Copy link
Collaborator

@cemde cemde commented Dec 22, 2025

Description

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Code quality improvement (refactoring, formatting, etc.)

Checklist

Contribution

Documentation

  • Added/updated docstrings for new/modified functions as instructed CONTRIBUTING.md
  • Updated relevant documentation in docs/ (if applicable)
  • Tag github issue with this PR (if applicable)

Changelog

  • Added entry to CHANGELOG.md under [Unreleased] section
    • Use Added section for new features
    • Use Changed section for modifications to existing functionality
    • Use Fixed section for bug fixes
    • Use Removed section for deprecated/removed features
  • OR this is a documentation-only change (no changelog needed)

Example:
- Support for multi-agent tracing (PR:#123)

Architecture (if applicable)

  • Core/Interface separation: Changes in maseval/core/ do NOT import from maseval/interface/
  • Dependencies: New core dependencies added sparingly; framework integrations go to optional dependencies

Additional Notes

@github-actions
Copy link

github-actions bot commented Dec 22, 2025

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  maseval/benchmark/macs
  data_loader.py 255, 297
  macs.py
  maseval/benchmark/tau2
  __init__.py
  data_loader.py 84-85, 100-104, 141, 150, 163, 168, 252, 296, 338, 343, 415, 470
  environment.py 84, 137, 173-182, 251-253, 292, 310-315
  evaluator.py 225, 239-241, 254-255, 278, 289-290, 316, 326-328, 344, 403, 412, 524
  tau2.py 326, 555, 711, 731, 733, 735, 741, 815-818
  utils.py 169-170, 174-179, 204-209
  maseval/benchmark/tau2/domains
  base.py 74, 82, 162-170, 271-275, 287, 328, 333-342
  maseval/benchmark/tau2/domains/airline
  db.py
  models.py
  tools.py 61, 69, 77, 91-94, 108, 112, 138, 150-162, 179, 182, 184, 330-339, 399, 440, 444, 475, 480, 495, 560, 662, 666
  maseval/benchmark/tau2/domains/retail
  db.py
  models.py
  tools.py 65, 86, 104, 124, 217, 243, 311, 367-368, 416, 426, 430, 442, 542, 551, 555, 566, 577-578, 584, 626, 632-665, 746, 752
  maseval/benchmark/tau2/domains/telecom
  db.py
  models.py
  tools.py 74, 83, 87, 92, 96, 101, 110, 126, 147, 160-180, 229, 255, 277, 501, 563-616, 634-640
  user_models.py 255-257
  user_tools.py 45, 51, 107, 142, 145, 147, 151, 208, 302-303, 386, 461, 478, 506, 538-539, 555-563, 577-578, 583-588
  maseval/core
  simulator.py
  task.py
  user.py 426, 429-434, 444, 518-523
  maseval/interface/inference
  anthropic.py
  google_genai.py 178-188, 193
Project Total  

The report is truncated to 25 files out of 29. To see the full report, please visit the workflow summary page.

This report was generated by python-coverage-comment-action

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants