Skip to content

0.0.355#182

Merged
yochze merged 83 commits intomainfrom
0.0.355
Apr 10, 2026
Merged

0.0.355#182
yochze merged 83 commits intomainfrom
0.0.355

Conversation

@yochze
Copy link
Copy Markdown
Contributor

@yochze yochze commented Apr 5, 2026

No description provided.

yochze and others added 30 commits March 23, 2026 23:08
…dal detail

Adds visibility into which instructions are loaded during agent completions:
- Backend emits `instructions.context` SSE event after context build and after tools load related instructions
- Chat shows lightweight "N instructions loaded" indicator with link to trace modal
- Trace modal gets Instructions left-pane item with full table (title, category, load mode, reason, type)
- Tool detail in trace modal shows instructions loaded by that specific tool

https://claude.ai/code/session_0181SQjfnCYLV799NeHWtsLn
…le everywhere

- Persist loaded_instructions in completion JSON field so indicator survives page refresh
- Add loaded_instructions to CompletionV2Schema and hydrate from completion_service
- Remove 'view' link from chat indicator, switch to cube icon
- TraceModal: replace full table with compact collapsible list with always/intelligent counts
- TraceModal: tool-loaded instructions now collapsible with chevron toggle
- DescribeTablesTool: show "N instructions loaded" with expandable list inline

https://claude.ai/code/session_0181SQjfnCYLV799NeHWtsLn
- Move "N instructions" indicator from top of completion to footer row
  (after thumbs up/down, before debug button)
- DescribeTablesTool: move instructions inside the table list as last <li>,
  same expand/collapse pattern as tables, collapsed by default

https://claude.ai/code/session_0181SQjfnCYLV799NeHWtsLn
- Replace static "N instructions" text with UPopover that shows list on click
- Each instruction row shows title, load mode badge (always/intelligent)
- Clicking an instruction row fetches full instruction and opens InstructionModalComponent
- Reuses existing showTrainingInstructionModal pattern

https://claude.ai/code/session_0181SQjfnCYLV799NeHWtsLn
SQLAlchemy doesn't detect in-place mutations on JSON columns.
Added flag_modified() call so the loaded_instructions data actually
gets committed to DB when update_completion_status runs later.

https://claude.ai/code/session_0181SQjfnCYLV799NeHWtsLn
…e-I2wUE

Add instructions loading visibility to trace modal and reports
Replaces the post-loop SuggestInstructions LLM-call with an agentic
sub-loop that runs after the main agent loop completes when trigger
conditions fire. The harness can search existing instructions, verify
facts via inspect_data/describe_tables, and decide whether to create
or edit instructions.

- New search_instructions tool (knowledge + training modes) for
  discovering existing instructions before creating duplicates.
- New "knowledge" mode + _build_knowledge_prompt with reflection
  framing, trigger reasons injection, and tight iteration budget.
- _run_knowledge_harness sub-loop in agent_v2 (max 5 steps) that
  spins up a knowledge-mode planner, runs tool calls via the existing
  ToolRunner, and streams instructions.suggest.partial events.
- Instructions land in a draft AI build that is submitted for review
  (matches the previous suggestion semantics — admin still approves).
- Trigger conditions are formatted into a <trigger_conditions> block
  injected into the harness prompt as hints.
- create_instruction / edit_instruction allowed_modes expanded to
  include "knowledge".
- Removed _stream_suggestions_inline from agent_v2 (replaced).

https://claude.ai/code/session_01KnqwhvsU9FANr3tjMfRi2Q
yochze and others added 28 commits April 6, 2026 23:53
The new RBAC permission registry uses 'manage_llm', not the legacy
'manage_llm_settings' string. The settings layout also gated the LLM
tab on a non-existent 'modify_settings' permission, hiding the tab
from members entirely.

- Rename manage_llm_settings -> manage_llm in LLMsComponent so write
  controls (Integrate Models button, Actions column, toggle) are
  properly hidden from members.
- Allow the LLM settings tab to render for any authenticated user;
  intra-page controls remain gated by manage_llm.
- Drop the stale page-level permission gate on /settings/models.

Fixes the playwright visibility test 'member can see LLM tab
(read-only)'.
Members lack modify_settings, so the LLM tab is intentionally hidden
from the settings layout. Update the visibility spec to reflect that
the tab should not be rendered, instead of expecting a read-only view.
Mirrors the Slack/Teams external-platform pattern: a WhatsAppAdapter
subclass of PlatformAdapter, registered in the factory; WhatsAppConfig
schema; create_whatsapp_platform service with Meta Graph validation;
POST /settings/integrations/whatsapp config route; GET/POST
/api/settings/integrations/whatsapp/webhook with X-Hub-Signature-256
verification, phone_number_id-based tenant routing, and id-based dedupe.

Adapter parses Cloud API webhooks, sends text replies with context
threading, uploads media in two steps, and maps Slack-style emoji
reactions to unicode for the manager's processing indicator.

Frontend adds a WhatsApp card and modal to settings/integrations.

Tests: 21 unit tests (adapter parsing, signature verify, send/reaction/
media mocked via httpx MockTransport, and full webhook route behaviour
incl. handshake, bad-signature, dedupe, status-only, non-text, dispatch).

Sandbox: backend/scripts/whatsapp_sandbox_debug.py boots the real adapter
+ route against a mock Meta Graph server and replays a full scenario
(handshake, unverified->verified, threaded reply, status-only, dedupe).

https://claude.ai/code/session_01XtJkarHadpmRsGU5quSYrY
Adds backend/tests/e2e/rbac/ — seven test modules totalling 35 tests
covering the registry, every CRUD/visibility path on data sources,
instructions, builds, entities, evals, and the resolver paths in
permission_resolver._resolve_permissions_inner.

Why
---
PR 182 introduced the new RBAC primitives (roles, role assignments,
groups, resource grants, registry, resolver) but the existing e2e
coverage only spot-checked individual endpoints. We needed a tight
matrix that exercises real users + per-DS grants against the live
FastAPI app to lock in behaviour and surface any drift quickly.

Test files
----------
- test_rbac_registry.py: pure static parity. AST-walks routes/*.py for
  every @requires_permission and check_resource_permissions call and
  asserts the literal permission strings exist in the registry. Catches
  the manage_tests / modify_settings class of bug instantly.
- test_rbac_data_sources.py: detail-level access matrix +
  list/detail-invariant in one shot.
- test_rbac_instructions.py: per-DS create_instructions matrix, owner
  vs admin vs other edit branches, list visibility filtering.
- test_rbac_builds.py: list endpoint is org-only, publish/submit/etc.
  enforce per-DS create_instructions on every touched DS.
- test_rbac_entities.py: forward list/detail invariant for entities,
  the recently fixed bug class.
- test_rbac_evals.py: suite endpoints are strictly org-level,
  case create endpoint is resource_scoped + per-DS gated.
- test_rbac_role_principals.py: walks every resolver path —
  user direct role, group → role, user resource grant, group
  resource grant, full_admin_access bypass, and assignment-removal
  freshness.

Findings (backend code changes)
-------------------------------
1. routes/test.py — both create_case and update_case called
   check_resource_permissions(..., 'create_evals'), but 'create_evals'
   is not in permissions_registry.RESOURCE_PERMISSIONS. The check
   therefore never matched anything for non-admins, locking per-DS
   evals authors out of any DS-scoped case creation. Changed both call
   sites to 'manage_evals' (the canonical resource permission, already
   in the registry and ORG_PERM_IMPLIES_RESOURCE).

   The new test_rbac_registry.py::test_check_resource_permissions_uses_known_resource_perms
   walks check_resource_permissions calls in routes/ and would have
   failed CI on the original code; the test is now the regression
   guard for this drift class.

   The existing test_rbac_complex_roles.py::test_eval_case_mixed_ds_list_denied
   test still passes — the role under test only holds 'run_evals' so
   denial still applies, just for the right reason now.

2. tests/e2e/test_rbac.py::test_permissions_registry_endpoint — was
   asserting "Reports" appears in the categories response, but Reports
   moved to HIDDEN_PERMISSION_CATEGORIES on purpose so the role-editor
   UI doesn't render meaningless checkboxes for it. Updated the
   assertion to match current behaviour. (Pre-existing failure on
   PR 182, unrelated to our fixture/test additions.)

Known bugs surfaced as xfail
----------------------------
- test_rbac_data_sources.py::test_data_source_grant_appears_in_list
  data_source_service.get_data_sources filters the LIST by the legacy
  DataSourceMembership table only — it ignores ResourceGrant rows. A
  user with a per-DS RBAC grant but no DataSourceMembership opens the
  DS in detail (resolver path) but never sees it in the list. Marked
  strict xfail with full repro context; lift the marker once the
  service-layer filter is unioned with ResourceGrant.

Out of scope
------------
- Frontend Playwright tests (this branch is backend only)
- Performance/load testing the resolver
- Migration changes
- New RBAC features

https://claude.ai/code/session_01LVmq3ikLQFiS6Zs1fqb8Ce
The sandbox feedback loop doc tells contributors to set up a Python
3.12 venv at backend/.venv; add it to .gitignore so it never gets
accidentally committed alongside test fixture work.

https://claude.ai/code/session_01LVmq3ikLQFiS6Zs1fqb8Ce
…nstructions, role grants

Reflects the new RBAC semantics introduced in the latest base-branch
``rbac improvements`` commit:

- ``view`` / ``view_schema`` are now implicit on any DS grant; removed
  from RESOURCE_PERMISSIONS and from every test fixture's grant payload.
- ``create_instructions`` (resource-level) is renamed to
  ``manage_instructions``; updated all grant payloads, route assertions,
  and route enforcement tests.
- /api/builds GET endpoints are no longer admin-only — they apply a
  per-DS filter via ``get_accessible_data_source_ids``. Updated
  test_rbac_builds.py to expect 200 + filtered list for non-admins.
- ``data_source_service.get_data_sources`` now unions ResourceGrant with
  the legacy DataSourceMembership table, fixing the list/detail
  invariant bug captured by the previous xfail. Removed the xfail
  marker from ``test_data_source_grant_appears_in_list`` — it now
  asserts forward correctness as a regression guard.

Added tests for the new resolver paths:

- ``test_role_as_principal_resource_grant``: a custom role created with
  inline ``resource_grants`` propagates to a directly-assigned member
  via ResourceGrant.principal_type='role'.
- ``test_role_as_principal_grant_via_group_assignment``: same path
  but transitively (user → group → role → role-attached resource grant).
- ``test_view_and_view_schema_are_implicit_on_any_grant``: holding only
  ``manage_instructions`` on a DS lets the user GET /data_sources/{id}
  and /full_schema, which require ``view`` / ``view_schema``.
- ``test_view_and_view_schema_not_explicit_resource_perms``: registry
  guard asserting both strings stay out of RESOURCE_PERMISSIONS and
  RESOURCE_SCOPED_GROUPS.

Existing test fixup
-------------------
``test_rbac_complex_roles.py`` arrived from the upstream merge with
three dangling ``assert  in perms["permissions"]`` lines (syntax error)
left over from a stale find/replace that removed ``view_instructions``
without cleaning up the assertion sites. Removed the dangling lines so
the file parses again. The corresponding tests now skip cleanly when
no enterprise license is present (matching the pre-existing pattern).

Suite size: 39 RBAC tests (was 35 + 1 xfail), all passing under
``TESTING=true pytest -m e2e --db=sqlite tests/e2e/rbac/``. The
existing tests/e2e/test_rbac*.py + test_eval.py also pass with no
regressions (36 passed, 20 skipped for enterprise license).

https://claude.ai/code/session_01LVmq3ikLQFiS6Zs1fqb8Ce
…skipping

Adds backend/tests/e2e/conftest.py with a session-scoped autouse
fixture that flips the license cache to "enterprise active" for the
entire e2e session. Without this, every existing RBAC test that
probed for a real license (test_rbac.py, test_rbac_complex_roles.py,
test_rbac_policies.py) was skipping the enterprise codepaths in CI.

Approach: directly mutate ee_license._cached_license and
_cache_initialized once per session — get_license_info(),
has_feature() and is_enterprise_licensed() all read from those globals,
so a single session-level swap covers every entry point without
per-test monkeypatch overhead.

Effect on the existing suite (under sqlite, no real license key):
  test_rbac.py                 was 14 → now 15 passing (Reports test
                               was already fixed in c349884)
  test_rbac_complex_roles.py   was 1 + 16 skipped → now 15 passing
                               + 2 author-flagged "post-MVP" skips
  test_rbac_policies.py        was 0 + 15 skipped → now 9 passing
                               + 6 author-flagged "post-MVP" skips

The remaining 8 skips across complex_roles + policies are explicit
@pytest.mark.skip(reason="post-MVP: ...") markers that the original
test author placed for future view_evals/run_evals split work — they
are not enterprise-license skips and should not be touched.

Also removed the defensive ``if status_code == 402: pytest.skip(...)``
guards from tests/e2e/rbac/test_rbac_role_principals.py — with the
session-level enterprise stub in place, a 402 from the role/group
routes is now a real failure that should fail loudly, not silently
skip.

https://claude.ai/code/session_01LVmq3ikLQFiS6Zs1fqb8Ce
All 12 Copilot comments were legitimate; applied every one.

1. Unasserted PUTs to flip is_public (9 sites)
   Moved the "make DS private" dance into the sqlite_data_source
   fixture itself and asserted the PUT returns 200 and the flipped
   body reflects the requested value. Callers now pass
   ``is_public=False`` (the default) or ``is_public=True`` at creation
   time instead of duplicating an unasserted ``test_client.put(...)``
   block. Removes the "silent public DS" failure mode Copilot flagged
   — if the flip ever regresses, every RBAC fixture fails loudly with
   a clear message instead of quietly passing for the wrong reason.

2. Unused ``org_service`` import in rbac/conftest.py
   Leftover from an earlier iteration that was going to monkey-patch
   the organization service; the fake-license fixture never ended up
   needing it. Removed.

3. Missing status_code assertion in test_registry_hides_reports_category
   The test called ``resp.json()`` without first asserting 200. Added
   ``assert resp.status_code == 200, resp.text`` so an auth or route
   regression surfaces as a clear status-code mismatch instead of an
   opaque JSON decode error.

4. Tautological pagination assertion in test_list_builds_status_filter
   Replaced
     ``body["total"] == sum(1 for _ in items) or body["total"] >= len(items)``
   with the honest paginated invariant
     ``len(items) <= body["total"] and len(items) >= 1``
   — the previous form was always True because the RHS covered every
   case the LHS might fail.

Suite still passes clean: 39 passed, 0 skipped, 0 xfail under
``TESTING=true pytest -m e2e --db=sqlite tests/e2e/rbac/``.

https://claude.ai/code/session_01LVmq3ikLQFiS6Zs1fqb8Ce
…tion-hxKBp

Add WhatsApp Cloud API integration
@yochze yochze merged commit 719010a into main Apr 10, 2026
3 of 5 checks passed
@yochze yochze deleted the 0.0.355 branch April 11, 2026 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants