Skip to content

feat: add audit log for assessment mode#4872

Open
rschlaefli wants to merge 69 commits intov3from
audit-log
Open

feat: add audit log for assessment mode#4872
rschlaefli wants to merge 69 commits intov3from
audit-log

Conversation

@rschlaefli
Copy link
Copy Markdown
Member

@rschlaefli rschlaefli commented Sep 4, 2025

Summary by CodeRabbit

  • New Features
    • Added a standalone Audit Service with HTTP endpoints, readiness & metrics, and integrated audit logging across backend, workers, auth, and frontend assessment flows.
  • Chores/Infrastructure
    • Added Azurite (Azure Storage emulator) to local/CI, startup readiness checks, Docker/Helm manifests, env examples, and release workflows; standardized Postgres test images.
  • Tests
    • Added extensive API, integration, performance, and scenario test suites and CI jobs to build, test, and publish the Audit Service.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Sep 4, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds a new Audit service (apps/audit) with storage, schema, metrics, auth, client, tests and CI/Helm wiring; integrates Azurite into local/CI tooling and readiness checks; plumbs AuditClient through backend, workers, GraphQL, frontend, tests and environment/configs; pins Postgres images to 14.x.

Changes

Cohort / File(s) Summary
Audit service
apps/audit/*, apps/audit/Dockerfile, apps/audit/package.json, apps/audit/rollup.config.js, apps/audit/tsconfig.json, apps/audit/vitest.config.ts
New Hono-based audit app: config (Zod), auth middleware, public/private routes, AuditEvent schema, entity factory, AuditTableClient (Azurite/Azure), logger, metrics, rate-limit, cookie util, build/test configs and Dockerfile.
Audit tests & fixtures
apps/audit/test/*, apps/audit/test/fixtures/*
Extensive unit/integration/performance/public E2E tests, AzureTableTestHelper, fixtures and scenario suites validating persistence, idempotency, partitioning and performance.
CI / GitHub workflows
.github/workflows/cypress-testing.yml, .github/workflows/test-graphql.yml, .github/workflows/test-audit.yml, .github/workflows/v3_audit-qa.yml, .github/workflows/v3_audit-prod.yml, .github/workflows/claude-*.yml
Adds Azurite to CI jobs, new audit test workflow and QA/prod image build flows, updates wait-on readiness endpoints, and pins Postgres images to 14.x; Claude model arg bumped.
Startup scripts & compose
.github/scripts/wait-for-services.sh, _run_app_dependencies.sh, _down.sh, docker-compose.yml
Added check_azurite waiter and Azurite dependency checks; azurite added to docker-compose and dev-run scripts; docker compose down now removes volumes for listed services.
Helm / Kubernetes
deploy/charts/klicker-uzh-v2/templates/*audit*.yaml, deploy/charts/klicker-uzh-v2/values.yaml, deploy/env-qa-v3/values.yaml, deploy/env-prod-v3/values.yaml
New Helm templates and values for audit ConfigMap, Secret, Deployment, Service and optional HPA; environment overrides for QA/Prod.
Types & util changes
packages/types/src/index.ts, packages/util/src/jwt.ts
Public audit typings (AuditScope, AuditAction, CorrelationClaims, AuditEvent types, AuditClientConfig/Responses) and JWTPayload extended with correlation fields.
GraphQL & backend wiring
packages/graphql/src/lib/context.ts, packages/graphql/src/services/*, packages/graphql/src/scripts/*
Adds auditClient to GraphQL Context; instruments services (accounts, liveQuizzes, stacks) and scripts with audit events; passes AuditClient into task/script preparation.
Tests wiring (GraphQL)
packages/graphql/test/**/*, packages/graphql/test/helpers.ts, packages/graphql/.env.example, packages/graphql/test/docker/*
Wires AuditClient through initializePrisma/testInitialization across tests; updates test docker-compose to include Azurite and adds audit env vars.
Backend, workers & response API
apps/backend-docker/*, apps/hatchet-worker-general/*, apps/hatchet-worker-response-processor/*, apps/response-api/*
Injects AuditClient into application/task initialization, emits startup/worker/internal audit events, and replaces many ad-hoc logs with structured InternalAuditEvent emissions.
Auth & frontend changes
apps/auth/src/pages/api/auth/[...nextauth].ts, apps/auth/src/lib/helpers.ts, apps/frontend-pwa/src/components/liveQuiz/QuestionArea.tsx, various .env.*
Non-blocking audit log on participant sign-in; console.log -> console.debug changes; frontend emits PARTICIPANT_SUBMIT_RESPONSE on submit; adds NEXT_PUBLIC_AUDIT_* env vars.
Repo/tooling & scripts
.versionrc.js, .gitignore, _run_app_dependencies.sh
Dynamic package.json discovery for versioning, added .gitignore rules (logs/AzuriteConfig), Azurite added to dev-run scripts and wait scripts.
Misc dependency & env additions
Multiple package.json and .env.* files across apps (axios bumps, AUDIT_* envs)
Minor dependency bumps (axios) and added AUDIT_* environment variables and examples across services, worker examples and CI/dev configs.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant PWA as Frontend PWA
  participant GQL as GraphQL/API
  participant WORKER as Worker
  participant AUD as Audit Service
  participant AZ as Azurite/AzureTable

  User->>PWA: submit action
  PWA->>AUD: POST /audit/public (cookie JWT)
  AUD->>AUD: auth -> rate-limit -> validate payload
  AUD->>AZ: ensureTable / upsertEntity (retry/backoff)
  AZ-->>AUD: ack / error
  AUD-->>PWA: {status: stored, eventId}

  GQL->>AUD: POST /audit (X-Internal-Token)
  WORKER->>AUD: POST /audit (worker events)
  AUD->>AZ: upsertEntity
  AZ-->>AUD: ack
Loading
sequenceDiagram
  autonumber
  participant CI as CI Job
  participant WFS as wait-for-services.sh
  participant AZ as Azurite
  participant AUDsrv as Audit test server

  CI->>AZ: start azurite container
  CI->>WFS: run wait-for-services.sh (includes 7080/ready)
  WFS->>AZ: check_azurite(host, port)
  AZ-->>WFS: ready
  CI->>AUDsrv: start audit service
  AUDsrv->>AZ: ensureTable / checkConnection
  CI-->>CI: run tests
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120+ minutes

Possibly related PRs

Suggested reviewers

  • sjschlapbach

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Title Check ⚠️ Warning The title “feat: add audit log for assessment mode” is too narrow and does not accurately capture the breadth of this pull request, which introduces a full audit logging service, integrates audit events across multiple backend and frontend components, updates CI workflows, configuration files, deployment charts, and test suites. Because the title only refers to audit logging in assessment mode, it fails to convey the primary change of adding centralized audit logging infrastructure and system-wide integration. Please update the title to concisely reflect the main change, for example: “feat: introduce centralized audit logging service and integrate audit events across services,” so that reviewers can immediately understand the scope of this feature.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🧪 Early access (Sonnet 4.5): enabled

We are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience.

Note:

  • Public repositories are always opted into early access features.
  • You can enable or disable early access features from the CodeRabbit UI or by updating the CodeRabbit configuration file.

Comment @coderabbitai help to get the list of available commands and usage tips.

@cypress
Copy link
Copy Markdown

cypress bot commented Sep 4, 2025

klicker-uzh    Run #6086

Run Properties:  status check failed Failed #6086  •  git commit 2949f883d8 ℹ️: Merge 67296ead17b307c14068617a1e9435c34147a460 into 041fcfc97ca10de3bf5c41ac5bad...
Project klicker-uzh
Branch Review audit-log
Run status status check failed Failed #6086
Run duration 11m 39s
Commit git commit 2949f883d8 ℹ️: Merge 67296ead17b307c14068617a1e9435c34147a460 into 041fcfc97ca10de3bf5c41ac5bad...
Committer Roland Schläfli
View all properties for this run ↗︎

Test results
Tests that failed  Failures 1
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 724
View all changes introduced in this branch ↗︎

Tests for review

Failed  cypress/e2e/N-course-workflow.cy.ts • 1 failed test

View Output Video

Test Artifacts
Test course creation and editing functionalities > Revoke all user group permissions and verify that the users cannot see the course and its content anymore Test Replay Screenshots Video

…into audit-log

# Conflicts:
#	packages/graphql/src/services/accounts.ts
#	pnpm-lock.yaml
…into audit-log

# Conflicts:
#	_run_app_dependencies.sh
#	_run_app_dependencies_macos.sh
#	_run_app_dependencies_wsl.sh
#	apps/backend-docker/.env.example
#	pnpm-lock.yaml
#	project/plans_archive/PLAN-audit-log-clients.md
#	project/plans_archive/PLAN-audit-log.md
#	project/plans_archive/PROJECT-audit-log.md
@gitguardian
Copy link
Copy Markdown

gitguardian bot commented Sep 13, 2025

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
1509424 Triggered Generic Password 582dee7 .github/workflows/test-graphql.yml View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@rschlaefli rschlaefli changed the title WIP: add audit log for assessment mode feat: add audit log for assessment mode Sep 13, 2025
…into audit-log

# Conflicts:
#	pnpm-lock.yaml
…into audit-log

# Conflicts:
#	.github/workflows/cypress-testing.yml
#	apps/backend-docker/.env.cypress
#	apps/backend-docker/.env.example
#	packages/util/src/index.ts
#	pnpm-lock.yaml
Base automatically changed from v3-assessment to v3 September 15, 2025 20:33
@rschlaefli rschlaefli self-assigned this Sep 20, 2025
…-log

# Conflicts:
#	.versionrc.js
#	_run_app_dependencies.sh
#	pnpm-lock.yaml
SERVER_LOGGER_LEVEL: debug
SERVER_LOGGER_FORMAT: console

azurite:
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Azurite is a local emulator for blob/table/queue storage. We could probably start it only in assessment mode though (maybe another flag for the run app dependencies?)

azureStorage:
tableName: 'auditlogs-prod'

corsOrigins: 'https://assessment.klicker.uzh.ch'
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we ever need any frontend audit logging from manage?

})
} catch (error) {
console.warn(
`[AUTH ${requestId}] Failed to log PARTICIPANT_LOGIN_SUCCESS:`,

Check failure

Code scanning / CodeQL

Use of externally-controlled format string High

Format string depends on a
user-provided value
.
Format string depends on a
user-provided value
.

Copilot Autofix

AI 7 months ago

To fix the problem, we should avoid directly embedding untrusted data into the format string passed as the first argument to logging methods such as console.warn. Instead, use a static format string and pass untrusted values as arguments for interpolation using %s. For line 173, this means changing the template literal so that requestId is not embedded in the format string; instead, use %s to indicate its place, e.g., "[AUTH %s] Failed to log PARTICIPANT_LOGIN_SUCCESS:", and then pass requestId as a subsequent argument. This change should be applied only to line 173; no imports or other regions need changing.


Suggested changeset 1
apps/auth/src/pages/api/auth/[...nextauth].ts

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/apps/auth/src/pages/api/auth/[...nextauth].ts b/apps/auth/src/pages/api/auth/[...nextauth].ts
--- a/apps/auth/src/pages/api/auth/[...nextauth].ts
+++ b/apps/auth/src/pages/api/auth/[...nextauth].ts
@@ -170,7 +170,8 @@
             })
           } catch (error) {
             console.warn(
-              `[AUTH ${requestId}] Failed to log PARTICIPANT_LOGIN_SUCCESS:`,
+              '[AUTH %s] Failed to log PARTICIPANT_LOGIN_SUCCESS:',
+              requestId,
               error instanceof Error ? error.message : error
             )
           }
EOF
@@ -170,7 +170,8 @@
})
} catch (error) {
console.warn(
`[AUTH ${requestId}] Failed to log PARTICIPANT_LOGIN_SUCCESS:`,
'[AUTH %s] Failed to log PARTICIPANT_LOGIN_SUCCESS:',
requestId,
error instanceof Error ? error.message : error
)
}
Copilot is powered by AI and may make mistakes. Always verify output.
services:
postgres:
image: postgres:15
image: postgres:14.18
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made this actually aligned with what we have on azure

@@ -0,0 +1,47 @@
name: Build Docker image for audit (Production)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jabbadizzleCode will require new pipelines on the your branch as well

role: string
}

export async function authMiddleware(
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backend needs to send a fixed token in the header. frontend sends the student login JWT. could probably merge to both being a JWT with different issuers and according validation, but for now this will be enough

const startTime = Date.now()
const event = c.req.valid('json')

metrics.requestsTotal.inc()
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idea: expose certain metrics for "prometheus", which will run on the new infra and can monitor apps. it could scale up the replicas if e.g. we have a lot of total requests, or many more requests than writes, etc.

if this works well, should expand on this in our own backend too, so we dont have only CPU/GPU usage metrics

tableClient: AuditTableClient
}

export function registerAuditPrivateRoutes(
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the actual endpoints are very simple on purpose. we dont reject events as long as they are properly authenticated (not a lot of validation on e.g. allowed types beside the public endpoint). to make sure that w log something invalid rather than dropping it with an error message

}

// All retries failed
setError(lastError)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would need to decide what we do if even though it retried it still returns an error. probably let the user know with a toast? maybe its also just their internet that is down and the main requests fail anyways

httpsAgent:
process.env.NODE_ENV === 'production'
? undefined
: new https.Agent({ rejectUnauthorized: false }),
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be done for dev mode as otherwise the https validation with our self made certificate would fail in certain cases

Comment thread util/wait-for-services.sh
@@ -0,0 +1,394 @@
#!/bin/bash
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copied and extended this script from the cypress workflows -> it now waits as part of run app dependencies until the db etc. are up. could also migrate cypress workflows etc. to use this script from util/

Comment thread .versionrc.js
// filename: `deploy/charts/klicker-uzh-v2/Chart.yaml`,
// updater: 'util/yaml-updater.js',
// },
...packageJsonFiles.map((filename) => ({ filename, type: 'json' })),
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much easier to parse the folders rather than having to hardcode all services -> so many were forgotten all the time and then not updated in package.json :D

Comment thread _run_app_dependencies.sh
echo "Skipping pnpm run build"
fi

SERVICE_ENDPOINTS="" ./util/wait-for-services.sh || {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty service endpoints means dependencies only

@@ -0,0 +1,284 @@
import { describe, expect, it } from 'vitest'
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests all work but we should have a real use case (right now we test with random data that it arrives, is processed, etc.) -> it might be interesting if we can run the assessment mode in cypress and validate the audit log output to be correct after the cypress run (maybe dump it to a file there). otherwise, the tests here are not so useful beside validating that data can be written and the api responds correctly (also why it is irrelevant what kind of data we store)

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 71

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (10)
apps/response-api/src/index.ts (1)

345-350: Use SHA‑256 instead of MD5 for tracking id derivation

Even for non-crypto “tracking”, prefer SHA‑256. No behavior change beyond id format; downstream uses a hex string already.

-  const combinedCorrelationKey = `${correlationKey}:${user.sub}`
-  const MD5 = createHash('md5')
-  MD5.update(combinedCorrelationKey)
-  const correlationId = MD5.digest('hex')
+  const combinedCorrelationKey = `${correlationKey}:${user.sub}`
+  const sha = createHash('sha256')
+  sha.update(combinedCorrelationKey)
+  const correlationId = sha.digest('hex')
packages/graphql/src/services/stacks.ts (3)

771-781: Missing await inside Prisma transaction (detail creation).

createFlashcardResponseDetail(...) is not awaited. Unawaited Prisma calls inside $transaction can run after the transaction closes, causing “Transaction already closed” errors or lost writes.

-    createFlashcardResponseDetail({
+    await createFlashcardResponseDetail({
       prisma,
       id,
       response,
       courseId,
       answerTime,
       existingInstance,
       participantId: ctx.user.sub,
     })

2456-2461: Bug: Updating totalPointsAwarded with null resets the tally.

In upsertQuestionResponse update branch:

totalPointsAwarded:
  typeof pointsAwarded === 'number' ? { increment: pointsAwarded } : null

Passing null here sets the DB field to null instead of leaving it unchanged. Use undefined to skip the update.

-      totalPointsAwarded:
-        typeof pointsAwarded === 'number' ? { increment: pointsAwarded } : null,
+      totalPointsAwarded:
+        typeof pointsAwarded === 'number'
+          ? { increment: pointsAwarded }
+          : undefined,

Also applies to: 2456-2459


3314-3316: Guard division by zero for stackAnswerTime averaging.

If responses.length === 0, round(stackAnswerTime / responses.length) yields Infinity/NaN. Add a safe default.

-  const elementAnswerTime = round(stackAnswerTime / responses.length)
+  const elementAnswerTime = responses.length
+    ? round(stackAnswerTime / responses.length)
+    : 0
packages/graphql/src/services/liveQuizzes.ts (1)

1118-1129: Use a composite key for scheduledJobs to avoid collisions across quizzes.

Indexing scheduled jobs by blockId alone can collide when two quizzes have the same blockId, leading to incorrect cancellations.

-  if (updatedQuiz.activeBlock?.expiresAt) {
-    scheduledJobs[blockId] = schedule.scheduleJob(
+  if (updatedQuiz.activeBlock?.expiresAt) {
+    const jobKey = `${quizId}:${blockId}`
+    scheduledJobs[jobKey] = schedule.scheduleJob(
       dayjs(updatedQuiz.activeBlock.expiresAt).add(10, 'second').toDate(),
       async () => {
         await deactivateLiveQuizBlock({ quizId, blockId }, ctx, true)
         ctx.emitter.emit('invalidate', {
           typename: 'LiveQuiz',
           id: updatedQuiz.id,
         })
       }
     )
   }
@@
-  if (!isScheduled && scheduledJobs[blockId]) {
-    await scheduledJobs[blockId].cancel()
-    delete scheduledJobs[blockId]
+  const jobKey = `${quizId}:${blockId}`
+  if (!isScheduled && scheduledJobs[jobKey]) {
+    await scheduledJobs[jobKey].cancel()
+    delete scheduledJobs[jobKey]
   }

Also applies to: 1375-1378

apps/auth/src/middleware.ts (1)

222-228: Sensitive data may be logged (referer, full callback URL incl. OAuth params)

Multiple debug logs include referer and full callback URLs. In OAuth callback handlers this can leak code/state into logs. console.debug prints in Node like console.log, so this affects production unless gated.

Apply targeted redactions:

-    console.debug('Stateless context detection result:', {
-      isParticipantContext,
-      participantParam,
-      referer,
-      pathname,
-    })
+    console.debug('Stateless context detection result:', {
+      isParticipantContext,
+      participantParam,
+      refererHost: getHostFromHeaderUrl(referer),
+      pathname,
+    })
@@
-          console.debug(
-            'Callback: injected params from cookie and cleared it',
-            {
-              url: url.toString(),
-              cookieSaysParticipant,
-              cookieSaysLecturer,
-            }
-          )
+          console.debug('Callback: injected params from cookie and cleared it', {
+            pathname: url.pathname,
+            cookieSaysParticipant,
+            cookieSaysLecturer,
+          })
@@
-    console.debug('PWA origin detected. Redirecting to PWA login.', {
-      referer,
-      refererHost,
-      redirectToParam,
-      redirectToHost,
-      pwaLoginUrl,
-    })
+    console.debug('PWA origin detected. Redirecting to PWA login.', {
+      refererHost,
+      redirectToHost,
+      pwaLoginUrl,
+    })
@@
-        console.debug('Set redirect cookie on signin:', {
-          cb,
-          isParticipantContext,
-        })
+        console.debug('Set redirect cookie on signin:', { isParticipantContext })

Additionally, consider gating all debug logs:

+const DEBUG = process.env.AUTH_DEBUG === '1' || process.env.NODE_ENV !== 'production'
+const dbg = (...args: any[]) => { if (DEBUG) console.debug(...args) }

Then replace console.debug calls with dbg. This prevents sensitive data from being printed in production by default.

Also applies to: 269-279, 128-135, 325-328

packages/graphql/test/template.test.ts (1)

12-17: Disable AuditClient in test initialization
initializePrisma (packages/graphql/test/helpers.ts) calls new AuditClient() with no config, so it throws if AUDIT_SERVICE_URL or AUDIT_TOKEN aren’t set. Pass a test config—e.g. new AuditClient({ enabled: false })—or supply dummy serviceUrl/auditToken to avoid constructor errors. Apply this change in all your test helpers.

.github/workflows/test-graphql.yml (1)

137-159: CI will fail without AUDIT_ or an opt-out — disable audit for GraphQL tests.*

AuditClient defaults to enabled and throws if AUDIT_SERVICE_URL/TOKEN are unset. Add AUDIT_ENABLED: false for this job to decouple tests.

Apply within the “Run tests” env:

           NODE_ENV: test
           HATCHET_API_URL: http://localhost:8888
           HATCHET_TENANT_ID: 707d0855-80ab-4e1f-a156-f1c4546cbf52
           HATCHET_HOST_PORT: 7077
+          AUDIT_ENABLED: false
           # Provide all app origins for consistency
packages/graphql/test/elementPermissions.test.ts (1)

1919-1925: Bug: assertion references the wrong variable after state change.

You re-query into missingPermissionUserOne but assert on the old derivedPermissionUserOne variable.

Apply:

-    expect(missingPermissionUserOne).toBeTruthy()
-    expect(derivedPermissionUserOne!.permissionLevel).toBe(
+    expect(missingPermissionUserOne).toBeTruthy()
+    expect(missingPermissionUserOne!.permissionLevel).toBe(
       PermissionLevel.OWNER
     )
packages/graphql/src/scripts/2025-09-14_hatchet_tasks_activity_publication.ts (1)

41-45: Bug: Redis port fallback yields NaN when REDIS_PORT is unset.

Number(process.env.REDIS_PORT) ?? 6379 will not fall back for NaN. Use nullish-coalescing before Number().

Apply this fix:

-    port: Number(process.env.REDIS_PORT) ?? 6379,
+    port: Number(process.env.REDIS_PORT ?? 6379),

Comment thread .versionrc.js
Comment thread apps/audit/package.json
Comment on lines +26 to +27
"@vitest/ui": "~2.1.8",
"cross-env": "~7.0.3",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Potential vitest/@vitest/ui major mismatch

You have vitest ~3.2.4 with @vitest/ui ~2.1.8; these should typically share the same major. Please verify compatibility and align versions if needed (e.g., both ^3.x).

Also applies to: 34-34


🌐 Web query:

Is @vitest/ui v2 compatible with vitest v3, or should their major versions match?

💡 Result:

Short answer: They should match — use @vitest/ui v3 with Vitest v3. Mismatched major versions (e.g., @vitest/ui v2 with vitest v3) have caused incompatibilities and the project explicitly publishes @vitest/* packages for the new major; the CLI also attempts to install matching add‑ons but has had issues installing the latest instead of the CLI's version. [1][2][3]

Sources:

  • Vitest 3 announcement / docs. [1]
  • @vitest/ui npm showing v3 releases. [2]
  • GitHub issue describing problems when add‑on major versions don’t match the Vitest CLI. [3]

Align @vitest/ui major version with Vitest
Upgrade @vitest/ui to a 3.x release to match Vitest v3 and avoid known incompatibilities (apps/audit/package.json lines 26–27, 34).

🤖 Prompt for AI Agents
In apps/audit/package.json around lines 26 to 27 (and also check line 34),
@vitest/ui is pinned to a 2.x release which is incompatible with Vitest v3;
update the dependency to a 3.x release (e.g., ^3.0.0 or ~3.x) in package.json,
update any other references on line 34 that mention @vitest/ui, run the package
manager install to update lockfiles, and run the test suite to verify
compatibility with Vitest v3.

Comment thread apps/audit/package.json
Comment on lines +45 to +47
"start": "node -r dotenv/config dist/index.js",
"start:test": "node -r dotenv/config dist/index.js dotenv_config_path=.env.cypress",
"test": "vitest run",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

dotenv is used at runtime but listed as a devDependency; start:test also misconfigures dotenv path

  • start uses node -r dotenv/config, which requires dotenv at runtime. With production installs, devDependencies aren’t present, causing startup failures.
  • start:test passes dotenv_config_path as an argument instead of setting DOTENV_CONFIG_PATH, so dotenv won’t load .env.cypress.
   "dependencies": {
     "@azure/data-tables": "13.2.2",
     "@hono/node-server": "1.19.0",
     "@hono/zod-validator": "0.4.1",
     "@klicker-uzh/util": "workspace:*",
     "hono": "4.9.6",
     "jose": "5.9.4",
+    "dotenv": "~16.4.7",
     "pino": "9.9.5",
     "prom-client": "15.1.3",
     "ulidx": "2.3.0",
     "zod": "3.25.76"
   },
   "devDependencies": {
     "@rollup/plugin-node-resolve": "~15.3.1",
     "@rollup/plugin-typescript": "~12.1.4",
     "@types/node": "^20.19.11",
-    "@vitest/ui": "~2.1.8",
-    "cross-env": "~7.0.3",
-    "dotenv": "~16.4.7",
+    "@vitest/ui": "~2.1.8",
+    "cross-env": "~7.0.3",
     "nodemon": "~3.1.10",
     "npm-run-all": "~4.1.5",
     "rollup": "~4.34.9",
     "tsx": "~4.19.4",
     "typescript": "~5.6.3",
     "vitest": "~3.2.4"
   },
   "scripts": {
     "build": "cross-env NODE_ENV=production rollup -c",
     "build:test": "pnpm run build",
     "check": "tsc --noEmit",
     "dev:assessment": "npm-run-all --parallel dev:build dev:run",
     "dev:build": "rollup -c --watch",
     "dev:doppler": "CONFIG=dev ../../../../util/_run_with_doppler.sh pnpm run dev",
     "dev:run": "nodemon -w dist/ --exec 'node --env-file .env ./dist/index.js'",
     "inspect:table": "tsx scripts/inspect-table.ts",
     "start": "node -r dotenv/config dist/index.js",
-    "start:test": "node -r dotenv/config dist/index.js dotenv_config_path=.env.cypress",
+    "start:test": "cross-env DOTENV_CONFIG_PATH=.env.cypress node -r dotenv/config dist/index.js",

Also applies to: 22-35

🤖 Prompt for AI Agents
In apps/audit/package.json around lines 45-47 (and similarly lines 22-35), the
scripts use node -r dotenv/config at runtime but dotenv is declared only as a
devDependency and start:test passes dotenv_config_path as a CLI argument rather
than setting the DOTENV_CONFIG_PATH environment variable; move dotenv from
devDependencies to dependencies so it’s available in production installs, and
update the start:test script to set DOTENV_CONFIG_PATH=.env.cypress (e.g.
DOTENV_CONFIG_PATH=.env.cypress node -r dotenv/config dist/index.js) so dotenv
loads the correct file at startup.

Comment thread apps/audit/package.json
Comment on lines +54 to +56
"engines": {
"node": "=20"
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Engine pin is too strict — allow Node 20 minors; require 20.6+ if using --env-file

"=20" effectively means 20.0.0 and will flag most installations. Recommend a range.

-  "engines": {
-    "node": "=20"
-  },
+  "engines": {
+    "node": ">=20.6 <21"
+  },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"engines": {
"node": "=20"
},
"engines": {
"node": ">=20.6 <21"
},
🤖 Prompt for AI Agents
In apps/audit/package.json around lines 54 to 56, the engines.node value "=20"
pins to 20.0.0 and is too strict; update it to allow Node 20 minor versions —
use ">=20 <21" to accept all 20.x releases, or if your code relies on the
--env-file behavior require Node 20.6+ and use ">=20.6.0 <21" instead; replace
the current "=20" with the appropriate range.

rootDir: 'src',
}),
],
external: [/@klicker-uzh*/, /node_modules/],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix external regex; current pattern doesn’t match scoped packages

/@klicker-uzh*/ applies * only to h. It won’t exclude @klicker-uzh/* as intended, causing accidental bundling.

-    external: [/@klicker-uzh*/, /node_modules/],
+    // Externalize workspace pkgs and node builtins
+    external: [/^@klicker-uzh\//, /node_modules/, /^node:/],

Optionally also externalize builtins explicitly:

+import { builtinModules } from 'node:module'
...
-    external: [/^@klicker-uzh\//, /node_modules/, /^node:/],
+    external: [...builtinModules, ...builtinModules.map((m) => `node:${m}`), /^@klicker-uzh\//, /node_modules/, /^node:/],
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
external: [/@klicker-uzh*/, /node_modules/],
// Externalize workspace pkgs and node builtins
external: [/^@klicker-uzh\//, /node_modules/, /^node:/],
🤖 Prompt for AI Agents
In apps/audit/rollup.config.js around line 22, the regex /@klicker-uzh*/ only
repeats the final "h" and doesn't match scoped packages like @klicker-uzh/*;
replace it with a regex that matches the scope plus any subpath (for example use
a pattern anchored for the scope such as ^@klicker-uzh\/.*$ or
/^@klicker-uzh\/.*/), and optionally add explicit Node builtins to the external
array (or import/derive require('module').builtinModules) so core modules aren't
accidentally bundled.

Comment on lines +554 to +556
azureStorage:
tableName: 'auditlogs-prod'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Azure Table name is invalid (hyphens not allowed).

Azure Table Storage table names must be 3–63 chars, start with a letter, and contain only alphanumeric characters. auditlogs-prod will fail. Use an alphanumeric name, e.g. AuditLogsProd or auditlogsprod.

Apply one of these diffs:

-  azureStorage:
-    tableName: 'auditlogs-prod'
+  azureStorage:
+    tableName: 'AuditLogsProd'

or

-  azureStorage:
-    tableName: 'auditlogs-prod'
+  azureStorage:
+    tableName: 'auditlogsprod'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
azureStorage:
tableName: 'auditlogs-prod'
azureStorage:
tableName: 'AuditLogsProd'
🤖 Prompt for AI Agents
In deploy/env-prod-v3/values.yaml around lines 554 to 556 the Azure Table name
is invalid because it contains a hyphen; Azure Table names must be 3–63
characters, start with a letter, and contain only alphanumeric characters.
Replace 'auditlogs-prod' with a valid alphanumeric name (e.g., AuditLogsProd or
auditlogsprod) and update any references that consume this value to use the new
tableName so deployments and templates remain consistent.

Comment on lines +315 to +317
azureStorage:
tableName: 'auditlogs-qa'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Azure Table name is invalid (hyphens not allowed).

auditlogs-qa violates Azure Table naming rules (alphanumeric only). Use AuditLogsQa or auditlogsqa.

-  azureStorage:
-    tableName: 'auditlogs-qa'
+  azureStorage:
+    tableName: 'AuditLogsQa'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
azureStorage:
tableName: 'auditlogs-qa'
azureStorage:
tableName: 'AuditLogsQa'
🤖 Prompt for AI Agents
In deploy/env-qa-v3/values.yaml around lines 315 to 317 the
azureStorage.tableName value 'auditlogs-qa' contains a hyphen which violates
Azure Table naming rules; replace it with a valid name using only alphanumeric
characters and start with a letter (e.g., AuditLogsQa or auditlogsqa), update
any dependent references (templates, scripts, or IAM/policy entries) to use the
new table name, and run a quick lint/helm template to confirm no other
occurrences remain.

Comment thread packages/graphql/.gitignore
@@ -1302,6 +1314,17 @@ export async function activateLiveQuizBlock(
})

redisMulti.exec()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Await the Redis pipeline to avoid lost writes/races.

redisMulti.exec() is not awaited; writes may be reordered or dropped under load, especially before emit/publish and audit log steps.

-  redisMulti.exec()
+  await redisMulti.exec()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
redisMulti.exec()
// …previous pipeline commands…
await redisMulti.exec()
// …emit/publish and audit log steps…
🤖 Prompt for AI Agents
In packages/graphql/src/services/liveQuizzes.ts around line 1316, the call to
redisMulti.exec() is not awaited which can cause lost or reordered writes under
load; change to await the exec() call (or promisify/callback to Promise if exec
uses a callback) and handle rejection/result (log/error/throw) before proceeding
to subsequent emit/publish and audit-log steps so all Redis writes complete
successfully and failures are surfaced.

Comment thread packages/graphql/src/services/liveQuizzes.ts
Comment thread apps/audit/src/storage/entities.ts Outdated
export interface AuditTableEntity {
partitionKey: string
rowKey: string
scope: string
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rschlaefli If we know that the value will be one of three strings, is there a reason, why we do not type this accordingly?

action: event.action,
timestamp: event.timestamp,
resource: event.resource || null,
// Include subset of attributes for uniqueness (avoid full object for size)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rschlaefli Do I understand this correctly that we do not take a subset of the attributes (contrary to what the comment states), but actually just cut off part of the hash we generate from the attributes? Does this make sense? This seems like we go through the potentially more expensive computation to compute the hash for the large object, throw away half of what is already a compact representation, but do not care about the other fields above, which have up to 200-400 characters according to the zod validation?

action: AuditAction.PARTICIPANT_SUBMIT_RESPONSE,
resource: `instance:${instanceId}`,
scope: AuditScope.PUBLIC,
correlationId: correlationKey ?? undefined,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is important: This correlationKey is not the same as the correlationId we use later on from within the response api through the assessment processor to track a response - the latter also contains the participant id, while this correlation key only contains the quiz id, instance id and execution ix in a format that can be validated

'create-audit-log-entry',
{
subject: `participant:unknown`,
correlationId: correlationKey,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note as for the frontend use - the correlationKey is not the same as the correlationId we combine from the correlation key and the participant id further down (the same holds for all the following audit log entries?)

{
scope: AuditScope.INTERNAL,
subject: `participant:${user.sub}`,
correlationId: correlationKey,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same as above) - from here onwards, the correlationId is available and should be used

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/auth/src/lib/helpers.ts (1)

206-212: PII exposure: email arrays logged at debug level.

The debug log includes emailCollection.profileEmails and emailCollection.affiliationEmails, both of which likely contain user email addresses (PII). This poses the same privacy risk as the previous comment.

Apply this diff to log only counts:

-      console.debug(
-        `Invitation mode ${invitationEmailMode} provided no eligible emails.`,
-        {
-          profileEmails: emailCollection.profileEmails,
-          affiliationEmails: emailCollection.affiliationEmails,
-        }
-      )
+      console.debug(
+        `Invitation mode ${invitationEmailMode} provided no eligible emails. Profile: ${emailCollection.profileEmails.length}, Affiliation: ${emailCollection.affiliationEmails.length}`
+      )
🧹 Nitpick comments (47)
apps/response-api/.env.example (1)

9-11: LGTM! Clean integration of audit configuration.

The three audit environment variables are well-structured and appropriately placed. The AUDIT_TOKEN example value clearly signals this is a test token, which is correct for an .env.example file.

Optional enhancement: Consider adding inline comments to help developers understand each variable's purpose, especially around token security requirements for production:

+# Enable audit logging for assessment activities
 AUDIT_ENABLED=true
+# URL of the audit service endpoint
 AUDIT_SERVICE_URL=http://127.0.0.1:7080
+# Authentication token for audit service (generate a secure token for production)
 AUDIT_TOKEN=test-secret-token-123
packages/graphql/src/services/stacks.ts (1)

2620-2640: Audit logging scaffolding references undefined variables.

The commented audit logging code references several undefined variables (auditClient, sessionId, participantId, startTime) that are not declared in the current scope. While the scaffolding approach is reasonable for future implementation, these undefined references could cause runtime errors if the code is accidentally uncommented during development.

Consider either:

  1. Adding TODO comments with the required variable declarations, or
  2. Wrapping the scaffolding in a conditional check for a feature flag to make it more explicit that this code is incomplete

Example guard pattern:

+  // TODO: Initialize audit logging dependencies when feature is enabled
+  // const auditClient = ctx.auditClient
+  // const sessionId = ctx.sessionId
+  // const participantId = ctx.user?.sub
+  // const startTime = Date.now()
+
   // // Log response submission attempt
-  // if (participantId && ctx.user?.role === DB.UserRole.PARTICIPANT) {
+  // if (ctx.featureFlags?.auditLogging && participantId && ctx.user?.role === DB.UserRole.PARTICIPANT) {
   //   await auditClient.log({

Also applies to: 2656-2673, 2812-2828, 2931-2948, 2952-2971

apps/hatchet-worker-response-processor/.env.example (2)

10-12: LGTM! Consider adding inline comments for clarity.

The audit configuration variables are correctly added and use appropriate example values for local development. The flag, service URL, and token pattern align with the PR's objective to add audit logging.

Consider adding inline comments to document these variables:

+# Audit logging
 AUDIT_ENABLED=true
 AUDIT_SERVICE_URL=http://127.0.0.1:7080
 AUDIT_TOKEN=test-secret-token-123

12-12: Ensure production audit tokens use secure secret management.

The example token is appropriately marked as a placeholder. In production deployments, ensure audit tokens are managed through secure secret management systems (e.g., Kubernetes secrets, AWS Secrets Manager, environment-specific secure vaults) and never committed to version control.

apps/auth/src/middleware.ts (1)

61-66: Consider removing commented-out code.

Commented-out debug logging should either be removed entirely or controlled via an environment variable/feature flag if it's needed for occasional debugging. Leaving commented code in the codebase can lead to confusion about its intended use.

If you want to keep this logging available for debugging, consider:

-  // console.debug('MIDDLEWARE RUNNING:', {
-  //   pathname,
-  //   fullUrl: request.url,
-  //   referer: request.headers.get('referer'),
-  //   searchParams: Object.fromEntries(request.nextUrl.searchParams.entries()),
-  // })
+  if (process.env.DEBUG_AUTH_MIDDLEWARE === 'true') {
+    console.debug('MIDDLEWARE RUNNING:', {
+      pathname,
+      fullUrl: request.url,
+      referer: request.headers.get('referer'),
+      searchParams: Object.fromEntries(request.nextUrl.searchParams.entries()),
+    })
+  }

Otherwise, simply remove the commented lines.

apps/frontend-pwa/.env.assessment.qa (1)

31-32: LGTM: Audit configuration added for QA assessment environment.

Both NEXT_PUBLIC_AUDIT_SERVICE_URL and NEXT_PUBLIC_AUDIT_ENABLED have been correctly added for the QA assessment environment, consistent with the PR's audit logging objectives.

Optional: The static analysis tool suggests reordering keys alphabetically (NEXT_PUBLIC_AUDIT_ENABLED before NEXT_PUBLIC_AUDIT_SERVICE_URL), but this has no functional impact and is purely a style preference.

_down.sh (1)

4-4: Review the -v flag for potential data loss.

The -v flag removes all volumes for the listed services, which will delete persistent data (PostgreSQL databases, Redis data, Azurite storage). This may be unexpected for developers running the down script during normal development workflows.

Consider one of:

  • Remove -v to preserve data by default: docker compose down postgres redis_exec ...
  • Make volume removal explicit via a parameter: docker compose down ${REMOVE_VOLUMES:+"-v"} postgres ...
  • Rename script to _down_clean.sh and create a separate _down.sh without -v

This prevents accidental data loss while providing an explicit cleanup option when needed.

apps/backend-docker/.env.cypress (1)

17-20: Consider reordering keys per linter convention.

The audit configuration is correct and appropriate for Cypress testing. However, the dotenv-linter suggests placing AUDIT_ENABLED before AUDIT_SERVICE_URL for consistency.

Apply this diff if you want to align with the linter:

 # Audit Service Configuration
-AUDIT_SERVICE_URL=http://127.0.0.1:7080
 AUDIT_ENABLED=true
+AUDIT_SERVICE_URL=http://127.0.0.1:7080
 AUDIT_TOKEN=test-secret-token-123
apps/backend-docker/.env.example (1)

18-21: Consider reordering keys in the example file.

The audit configuration is correct. Since this is an example file that serves as documentation, consider placing AUDIT_ENABLED before AUDIT_SERVICE_URL to align with dotenv-linter conventions and provide a consistent example.

Apply this diff:

 # Audit Service Configuration
-AUDIT_SERVICE_URL=http://127.0.0.1:7080
 AUDIT_ENABLED=true
+AUDIT_SERVICE_URL=http://127.0.0.1:7080
 AUDIT_TOKEN=test-secret-token-123
apps/frontend-pwa/.env.assessment (1)

35-36: Consider reordering for consistency (optional).

The audit configuration is functionally correct. The static analysis tool suggests placing NEXT_PUBLIC_AUDIT_ENABLED before NEXT_PUBLIC_AUDIT_SERVICE_URL for alphabetical ordering, but this is a minor style preference.

If you want to address the linter hint:

-NEXT_PUBLIC_AUDIT_SERVICE_URL=https://audit.klicker.uzh.ch
 NEXT_PUBLIC_AUDIT_ENABLED=true
+NEXT_PUBLIC_AUDIT_SERVICE_URL=https://audit.klicker.uzh.ch
packages/graphql/test/docker/docker-compose.test.yml (1)

96-103: Consider adding a volume for Azurite data persistence.

The Azurite service is correctly configured with all three storage service ports. However, without a named volume, audit data will be lost when the container is removed, which may complicate debugging test failures.

Apply this diff to add optional persistence:

   azurite:
     image: mcr.microsoft.com/azure-storage/azurite:latest
     ports:
       - 10000:10000 # Blob service
       - 10001:10001 # Queue service
       - 10002:10002 # Table service
     networks:
       - klicker
+    volumes:
+      - azurite_data:/data

And add the volume declaration:

 volumes:
   hatchet_lite_config:
+  azurite_data:
apps/audit/.env.cypress (1)

5-6: Optional: Reorder keys per linter preference.

The dotenv-linter suggests placing NODE_ENV before PORT. This is purely a stylistic preference and has no functional impact.

If you wish to silence the linter warning:

 # Server Configuration
-PORT=7080
 NODE_ENV=test
+PORT=7080
apps/audit/src/utils/metrics.ts (1)

40-50: Consider logging errors in the metrics endpoint.

The catch block returns a generic error response without logging the actual error, which may complicate debugging if metrics collection fails.

Apply this diff to add error logging:

 export function setupMetrics(app: Hono): void {
   app.get('/metrics', async (c) => {
     try {
       const register = client.register
       c.header('Content-Type', register.contentType)
       return c.text(await register.metrics())
     } catch (error) {
+      console.error('Failed to generate metrics:', error)
       return c.text('Error generating metrics', 500)
     }
   })
 }
apps/audit/Dockerfile (2)

4-14: LGTM - deps stage follows best practices.

The deps stage correctly prunes the monorepo for the audit scope using turbo. Version pinning of pnpm and turbo ensures reproducible builds.

Consider extracting pnpm and turbo versions to build args for easier maintenance across multiple Dockerfiles:

 FROM docker.io/library/node:20.18.1-alpine AS base
+ARG PNPM_VERSION=9.15.4
+ARG TURBO_VERSION=2.1.0

 FROM base AS deps
 WORKDIR /app

 RUN apk add --no-cache libc6-compat
 RUN apk update

-RUN npm i -g --ignore-scripts pnpm@9.15.4 turbo@2.1.0
+RUN npm i -g --ignore-scripts pnpm@${PNPM_VERSION} turbo@${TURBO_VERSION}

36-57: Consider removing pnpm/turbo from runtime stage.

The runtime stage installs pnpm and turbo globally but only executes node directly. These tools are unnecessary at runtime and increase image size and attack surface.

Apply this diff to remove unnecessary tooling from runtime:

 FROM base AS runtime
 WORKDIR /app

 ARG NODE_ENV="production"

-RUN npm i -g --ignore-scripts pnpm@9.15.4 turbo@2.1.0
+RUN npm i -g --ignore-scripts pnpm@9.15.4

 COPY --from=deps /app/out/json/ .
 COPY --from=deps /app/out/pnpm* .

 RUN pnpm i --frozen-lockfile --prod --ignore-scripts
+RUN npm uninstall -g pnpm

 RUN addgroup --system --gid 1001 nodejs
 RUN adduser --system --uid 1001 nodejs

 COPY --from=builder /app/apps/audit/dist/ /app/apps/audit/dist/

 USER nodejs

 EXPOSE 7080

 CMD ["node", "/app/apps/audit/dist/index.js"]

Alternatively, if no global tools are needed post-install, use a pure Node base for runtime without any package managers.

apps/audit/src/utils/rate-limit.ts (1)

24-31: Fixed-window rate limiting semantics.

The implementation uses a fixed window that resets at resetTime. This is simpler than a true sliding window but can allow up to 2× the limit in edge cases (requests at end of one window + start of next).

If strict rate limiting is required, consider a sliding window algorithm (e.g., token bucket or sliding log).

packages/graphql/test/activitySharing.test.ts (1)

312-325: Consider extracting an audit-log assertion helper

There are dozens of blocks like this that manually call findFirst, check truthiness, and assert the formatted message. Any tweak to the log format will require editing every block, and it’s easy to miss places when copying the pattern. A tiny helper (e.g. expectAuditLogEntry({ type, objectType, objectId, sourceUserId, targetUserId }, expectedMessage)) would collapse the duplication, keep the tests declarative, and make future message changes safer.

docker-compose.yml (1)

255-272: Consider adding a health check for the Azurite service.

The Azurite service lacks a health check definition, unlike other services in this compose file (postgres, redis, hatchet). Without a health check, dependent services may attempt connections before Azurite is ready.

   azurite:
     image: mcr.microsoft.com/azure-storage/azurite:latest
+    healthcheck:
+      test: ["CMD", "nc", "-z", "localhost", "10000"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
     ports:
       - '10000:10000' # Blob service
       - '10001:10001' # Queue service
       - '10002:10002' # Table service
     command: >
       azurite
       --tableHost 0.0.0.0
       --blobHost 0.0.0.0
       --queueHost 0.0.0.0
       --location /data
       --debug /data/debug.log
     volumes:
       - azurite_data:/data
     environment:
       - AZURITE_ACCOUNTS=devstoreaccount1
.github/workflows/test-graphql.yml (1)

101-106: Consider adding a health check for the Azurite service in CI.

The Azurite service lacks a health check configuration, unlike other services (postgres, redis, hatchet) which have explicit health checks. Without a health check, tests may start before Azurite is ready, leading to flaky test failures.

       azurite:
         image: mcr.microsoft.com/azure-storage/azurite:latest
+        options: >-
+          --health-cmd "wget --no-verbose --tries=1 --spider http://localhost:10000 || exit 1"
+          --health-interval 10s
+          --health-timeout 5s
+          --health-retries 5
         ports:
           - 10000:10000
           - 10001:10001
           - 10002:10002

Note: The AI summary mentions that wait-for-services.sh checks Azurite readiness at port 7080, but the service exposes standard Azure Storage ports (10000-10002). Verify that the readiness check matches the actual Azurite configuration.

.github/workflows/v3_audit-qa.yml (1)

32-32: Upgrade GitHub Actions to latest versions

The workflow uses outdated action versions with deprecated runners. Upgrade to avoid potential execution failures:

-      - name: Checkout repository
-        uses: actions/checkout@v3
+      - name: Checkout repository
+        uses: actions/checkout@v4

-      - name: Set up QEMU
-        uses: docker/setup-qemu-action@v2
+      - name: Set up QEMU
+        uses: docker/setup-qemu-action@v3

-      - name: Set up Docker Buildx
-        uses: docker/setup-buildx-action@v2
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v3

-      - name: Log into registry ${{ env.REGISTRY }}
-        uses: docker/login-action@v2
+      - name: Log into registry ${{ env.REGISTRY }}
+        uses: docker/login-action@v3

-      - name: Extract Docker metadata
-        id: meta
-        uses: docker/metadata-action@v4
+      - name: Extract Docker metadata
+        id: meta
+        uses: docker/metadata-action@v5

Also applies to: 35-35, 38-38, 41-41, 49-49

.github/workflows/test-audit.yml (2)

115-125: Consider parallelizing package builds.

The serial cd && pnpm run build pattern could be optimized by building independent packages in parallel.

       - name: Build dependency packages
         run: |
-          cd packages/prisma
-          pnpm run build
-          cd ../types
-          pnpm run build
-          cd ../grading
-          pnpm run build
-          cd ../util
-          pnpm run build
+          pnpm --filter @klicker-uzh/prisma run build
+          pnpm --filter @klicker-uzh/types run build
+          pnpm --filter @klicker-uzh/grading run build
+          pnpm --filter @klicker-uzh/util run build

Or use pnpm's workspace parallelism:

pnpm --filter "@klicker-uzh/{prisma,types,grading,util}" run build

126-140: Add wait-for-services health check before starting audit.

The audit service starts after a fixed 10-second sleep, but dependent services (Azurite, Postgres) may not be ready. Consider using the wait-for-services.sh script referenced in the AI summary or explicit health checks.

       - name: Start audit service
         run: |
+          # Wait for Azurite and Postgres to be ready
+          timeout 60 bash -c 'until curl -f http://127.0.0.1:10002/devstoreaccount1 2>/dev/null; do sleep 2; done'
+          timeout 60 bash -c 'until pg_isready -h 127.0.0.1 -p 5432 -U klicker 2>/dev/null; do sleep 2; done'
           cd apps/audit
           pnpm run build
           nohup pnpm start &
-          sleep 10
+          # Wait for audit service to be ready
+          timeout 60 bash -c 'until curl -f http://localhost:7080/healthz 2>/dev/null; do sleep 2; done'
apps/audit/src/config.ts (1)

21-21: Consider making AUDIT_CORS_ORIGINS split-aware.

The config defines CORS origins as a comma-separated string with a default. Consider transforming it to an array for easier consumption.

   // CORS configuration (comma-separated origins)
-  AUDIT_CORS_ORIGINS: z.string().default('https://assessment.klicker.com'),
+  AUDIT_CORS_ORIGINS: z
+    .string()
+    .default('https://assessment.klicker.com')
+    .transform((s) => s.split(',').map((o) => o.trim())),

This would make the parsed config.AUDIT_CORS_ORIGINS an array directly, simplifying downstream usage.

apps/audit/src/utils/logger.ts (1)

1-26: Well-configured structured logger with proper redaction.

The Pino logger configuration follows best practices with ISO timestamps, uppercase level formatting, and comprehensive redaction of sensitive fields. The inclusion of environment metadata aids in log aggregation across deployments.

Optional: Consider partial masking for authorization headers.

Currently, authorization is fully redacted. For debugging, you might consider partial masking (e.g., Bearer ***) to preserve the authentication scheme while hiding the token. However, full redaction is more secure and acceptable for audit logs.

   redact: {
     paths: [
       'password',
       'token',
-      'authorization',
       'connectionString',
       'attributes.password',
       'attributes.token',
       'attributes.secret',
     ],
     censor: '[REDACTED]',
+    remove: false,
   },
+  serializers: {
+    authorization: (value: string) => {
+      if (typeof value === 'string' && value.startsWith('Bearer ')) {
+        return 'Bearer [REDACTED]'
+      }
+      return '[REDACTED]'
+    },
+  },
deploy/charts/klicker-uzh-v2/templates/deployment-audit.yaml (1)

32-40: Consider service account security configuration.

The deployment doesn't specify serviceAccountName or automountServiceAccountToken. For defense in depth, either bind a dedicated service account with minimal RBAC permissions or disable token automounting if the audit service doesn't need Kubernetes API access.

If no K8s API access is needed:

     spec:
+      automountServiceAccountToken: false
       priorityClassName: {{ .Values.audit.priorityClassName }}

Or bind a dedicated service account:

     spec:
+      serviceAccountName: {{ include "chart.fullname" . }}-audit
       priorityClassName: {{ .Values.audit.priorityClassName }}
apps/audit/vitest.config.ts (2)

60-68: Consider test isolation strategies for better parallelization.

Single-threaded execution avoids Azurite table contention but severely limits test performance. Consider test isolation techniques (e.g., per-suite table names with random suffixes or cleanup in teardown) to enable parallel test execution while maintaining isolation.

Example approach:

// In test setup
const testTableName = `auditlogs_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`
// Use testTableName for this test suite
// Clean up in afterAll()

This would allow increasing maxThreads for faster test runs.


57-58: Monitor and fix flaky tests rather than relying on retries.

While retry=1 is reasonable for integration tests with external dependencies, investigate and eliminate the root causes of flakiness rather than masking them with retries.

apps/frontend-pwa/src/components/liveQuiz/QuestionArea.tsx (1)

199-207: Consider non-blocking audit logging for better user experience.

The audit log call in onSubmit is currently awaited, which could delay the submission flow if the audit service is slow or unreachable. The useAuditClient hook provides a logAsync method for fire-and-forget logging. Consider using logAsync instead to ensure the user submission flow is not blocked by audit logging:

-    await auditLog.log({
+    auditLog.logAsync({
       action: AuditAction.PARTICIPANT_SUBMIT_RESPONSE,
       resource: `instance:${instanceId}`,
       scope: AuditScope.PUBLIC,
       correlationId: correlationKey ?? undefined,
       attributes: {
         response: studentResponse.response,
       },
     })

This ensures audit logging failures or latency do not impact the participant's submission experience.

Based on learnings.

apps/audit/test/public-endpoint.test.ts (1)

42-145: LGTM: Comprehensive authentication test coverage.

The test suite thoroughly validates the authentication flow:

  • Valid participant tokens (200 response)
  • Missing cookies (401)
  • Wrong cookie names (401)
  • Invalid/expired tokens (401)
  • Malformed JWTs (401)

Each test verifies both status codes and error messages, providing good coverage for security-critical functionality.

Optional enhancement: Consider defining a response type instead of using any casts (e.g., lines 63, 82, 101):

type AuditResponse = {
  status: 'stored'
  eventId: string
} | {
  error: string
}

const result = await response.json() as AuditResponse
apps/audit/src/middleware/auth.ts (1)

39-71: Consider stricter type validation for JWT payload fields.

The function casts payload.sub and payload.role to strings without runtime type checks. If the JWT payload is malformed or tampered with (e.g., sub is a number or role is an array), the function may behave unexpectedly.

Consider adding explicit type guards:

     const participantId = payload.sub as string
     const role = payload.role as string
 
-    if (!participantId || !role) {
+    if (typeof participantId !== 'string' || typeof role !== 'string' || !participantId || !role) {
       return null
     }
apps/audit/test/utils/azure-table-helper.ts (1)

191-225: Complex object comparison may produce false positives.

Lines 206-216 compare complex objects by stringifying and comparing JSON. This approach is fragile:

  • Property order differences cause mismatches even when values are equivalent
  • Serialization of undefined, NaN, or circular references may behave unexpectedly
  • Nested objects with different but equivalent representations (e.g., { a: 1 } vs. { a: 1.0 }) may fail

Consider using a deep equality library or normalizing objects before comparison:

import { isDeepStrictEqual } from 'util'

// In verifyEntity:
if (typeof expectedValue === 'object' && expectedValue !== null) {
  const actualObj = typeof actualValue === 'string' ? JSON.parse(actualValue) : actualValue
  if (!isDeepStrictEqual(actualObj, expectedValue)) {
    throw new Error(
      `Entity ${key} mismatch. Expected: ${JSON.stringify(expectedValue)}, Got: ${JSON.stringify(actualObj)}`
    )
  }
}
apps/audit/src/storage/entities.ts (1)

95-97: Single-character shard provides limited distribution.

Line 96 uses only the first hex character of the hash for sharding, providing only 16 possible shards per time bucket. For very high-throughput scenarios (thousands of events per minute), this may create hot partitions.

Consider using 2 hex characters for 256 shards if higher distribution is needed:

   const shard = eventId
-    ? createHash('sha256').update(eventId).digest('hex')[0]!
+    ? createHash('sha256').update(eventId).digest('hex').substring(0, 2)
     : '0'

Monitor partition-level metrics in production to determine if increased sharding is beneficial.

apps/audit/test/integration.test.ts (3)

5-5: Consider making BASE_URL configurable via environment variable.

The hardcoded BASE_URL may cause issues if the audit service runs on a different port in CI or other environments.

Apply this diff to make it configurable:

-const BASE_URL = 'http://localhost:7080'
+const BASE_URL = process.env.AUDIT_SERVICE_URL || 'http://localhost:7080'

46-64: Optimize polling: avoid fetching all entities on each iteration.

The current implementation calls getAllEntities() on every poll, which becomes inefficient as entity count grows. Consider using partition key queries or adding a helper method that queries by rowKey directly.

If the partition key is predictable from the rowKey/eventId, refactor to use getEntity(partitionKey, rowKey). Otherwise, consider adding a filtered query to AzureTableTestHelper to avoid full table scans.


235-235: Replace fixed sleep delays with polling for reliability.

Multiple tests use fixed setTimeout delays (1-3 seconds) to wait for persistence. This can cause flakiness if operations take longer than expected or unnecessary slowness if they complete faster.

Replace fixed delays with the existing waitForEntityByRowKey helper or tableHelper.waitForEntityCount for more robust waiting. For example:

-      // Wait for persistence
-      await new Promise((resolve) => setTimeout(resolve, 1000))
+      // Wait for all events to be persisted
+      await tableHelper.waitForEntityCount(2, 5000)

Or wait for specific entities:

-      await new Promise((resolve) => setTimeout(resolve, 2000))
+      for (const event of authEvents) {
+        await waitForEntityByRowKey(event.eventId)
+      }

Also applies to: 273-273, 307-307, 341-341, 384-384

packages/graphql/src/services/accounts.ts (1)

100-160: Add success audit logging for consistency.

loginParticipant logs failure events but not success. Other login methods (magic link, LTI, temporary) log both success and failure for complete audit trails.

Add success logging after line 156:

   await doParticipantLogin(
     {
       participantId: participant.id,
       participantLocale: participant.locale,
     },
     ctx
   )

+  // Log successful login
+  await ctx.auditClient.log({
+    subject: `participant:${participant.id}`,
+    action: AuditAction.PARTICIPANT_LOGIN_SUCCESS,
+    attributes: {
+      method: 'password',
+    },
+  })
+
   // TODO: return more data (e.g. Avatar etc.)
   return participant.id
apps/audit/test/database-verification.test.ts (4)

139-140: Use tableHelper.waitForEntity for consistency.

Line 140 uses a fixed 2-second delay, while other tests use tableHelper.waitForEntity or waitForEntityCount with configurable timeouts. This makes the test less reliable and slower than necessary.

Replace the fixed delay:

-      // Wait and verify only one record exists
-      await new Promise((resolve) => setTimeout(resolve, 2000))
+      // Wait for entity to be persisted
+      await tableHelper.waitForEntity(
+        // We need to get the partition key first
+        (await tableHelper.getAllEntities())[0]?.partitionKey || '',
+        event.eventId,
+        5000
+      )

Or simplify to wait for count:

-      await new Promise((resolve) => setTimeout(resolve, 2000))
+      await tableHelper.waitForEntityCount(1, 5000)

168-169: Consider using tableHelper wait methods consistently.

Lines 168, 212, 248 use fixed setTimeout delays. For consistency with other tests (e.g., line 88) and better reliability, use tableHelper.waitForEntity or waitForEntityCount.

Example for line 168:

-      await new Promise((resolve) => setTimeout(resolve, 1000))
+      await tableHelper.waitForEntityCount(1, 5000)

521-522: Performance assertion may be flaky in CI.

The 1-second threshold for querying 10 entities from local Azurite is reasonable for local development, but may fail in resource-constrained CI environments or under load. Consider increasing the threshold or making it configurable.

-      // Query should be reasonably fast (under 1 second for 10 entities)
-      expect(queryTime).toBeLessThan(1000)
+      // Query should be reasonably fast (allow more time in CI)
+      const maxQueryTime = process.env.CI ? 3000 : 1000
+      expect(queryTime).toBeLessThan(maxQueryTime)

533-551: Incomplete test: No actual connectivity issue simulation.

The test name and comments suggest database connectivity testing, but it only submits a normal event and accepts either 200 or 503. To meaningfully test connectivity handling, consider:

  1. Testing the service behavior when Azurite is temporarily unavailable
  2. Validating retry logic and error responses
  3. Checking that events are queued or handled gracefully during outages

If actual connectivity testing is not feasible in this test suite, consider renaming the test to reflect what it actually validates or moving connectivity tests to a separate integration suite.

packages/graphql/src/services/liveQuizzes.ts (1)

2674-2679: Clean up commented audit log code.

The commented-out audit log suggests incomplete migration from Hatchet-based logging to the audit client. Either:

  1. Remove the commented code if Hatchet logging is the final approach
  2. Complete the migration to audit client with appropriate attributes (participant ID, instance ID, points deducted)

If migrating to audit client, consider:

await ctx.auditClient.log({
  scope: AuditScope.INTERNAL,
  action: AuditAction.USER_RESET_QUIZ_DEDUCT_POINTS,
  subject: `user:${ctx.user.sub}`,
  resource: `live-quiz:${id}`,
  attributes: {
    participantId: response.participantId,
    instanceId: instance.id,
    blockId: block.id,
    basePoints: response.basePoints,
    correctnessPoints: response.correctnessPoints,
    bonusPoints: response.bonusPoints,
  },
})
apps/audit/src/storage/table-client.ts (2)

77-98: Consider handling the unused result variable.

Line 82 retrieves the iterator result but doesn't use it. This may trigger linting warnings.

Apply this diff to acknowledge the intentionally unused result:

       const tablesList = this.client.listEntities()
-      const result = await tablesList.next() // Just check if we can get the iterator
+      void (await tablesList.next()) // Just check if we can get the iterator

187-197: Consider more specific error message matching.

The broad terms invalid and malformed might unintentionally match error messages that should be retried. For example, "Invalid network state" might be a transient issue.

If you have access to structured error codes or properties from the Azure SDK (e.g., statusCode, code), prefer those over message string matching. Otherwise, consider more specific patterns:

   private isNonRetryableError(error: Error): boolean {
     const message = error.message.toLowerCase()
     return (
       message.includes('entitytoolarge') ||
       message.includes('requestbodytoolarge') ||
-      message.includes('invalid') ||
-      message.includes('malformed') ||
+      message.includes('invalid request') ||
+      message.includes('invalid entity') ||
+      message.includes('malformed request') ||
       message.includes('unauthorized') ||
       message.includes('forbidden')
     )
   }
.github/workflows/cypress-testing.yml (1)

119-124: Consider pinning the Azurite version for reproducibility.

The latest tag can lead to non-reproducible builds if Microsoft releases a new version with breaking changes.

Pin to a specific version:

       azurite:
-        image: mcr.microsoft.com/azure-storage/azurite:latest
+        image: mcr.microsoft.com/azure-storage/azurite:3.35.0

Apply to both job definitions (lines 120 and 339).

Also applies to: 338-343

apps/audit/test/scenarios.test.ts (3)

4-5: Consider making BASE_URL configurable.

The hard-coded BASE_URL and fallback AUTH_TOKEN work for local testing but might need environment-specific configuration for different test environments.

Make the base URL configurable:

-const BASE_URL = 'http://localhost:7080'
+const BASE_URL = process.env.AUDIT_SERVICE_URL || 'http://localhost:7080'
 const AUTH_TOKEN = process.env.AUDIT_TOKEN || 'test-secret-token-123'

55-62: Replace hard-coded delay with polling-based verification.

The 2-second delay (line 56) is fragile—it might be too short under load or unnecessarily slow when persistence is quick.

Use waitForEntityCount from the test helper:

-  // Wait for persistence
-  await new Promise((resolve) => setTimeout(resolve, 2000))
-
-  // Verify all events are persisted
-  const persistedEntities = (await tableHelper.getAllEntities()).filter(
-    (entity) => (entity.rowKey ? eventIds.has(entity.rowKey) : false)
-  )
-  expect(persistedEntities.length).toBe(events.length)
+  // Wait for persistence with polling
+  const initialCount = await tableHelper.getEntityCount()
+  await tableHelper.waitForEntityCount(initialCount + events.length, 5000)
+  
+  // Retrieve persisted entities
+  const persistedEntities = (await tableHelper.getAllEntities()).filter(
+    (entity) => (entity.rowKey ? eventIds.has(entity.rowKey) : false)
+  )
+  expect(persistedEntities.length).toBe(events.length)

83-1013: LGTM! Comprehensive end-to-end scenario coverage.

The test suite provides excellent coverage of real-world audit scenarios with proper verification of event sequences, correlations, and business logic. The nested describe blocks and console logs enhance readability and debugging.

Some individual tests are quite long (150+ lines). Consider extracting shared verification logic into helper functions for improved maintainability:

function verifyEventSequence(events: any[], expectedActions: string[]) {
  const actions = events.map(e => e.action)
  expect(actions).toEqual(expectedActions)
}

function verifyAllEventsShareResource(events: any[], resourceId: string) {
  events.forEach(event => {
    expect(event.resource).toBe(resourceId)
  })
}


steps:
- name: Checkout repository
uses: actions/checkout@v3
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Update actions/checkout to v4.

The checkout action uses a deprecated runner.

       - name: Checkout repository
-        uses: actions/checkout@v3
+        uses: actions/checkout@v4
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
uses: actions/checkout@v3
uses: actions/checkout@v4
🧰 Tools
🪛 actionlint (1.7.7)

109-109: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🤖 Prompt for AI Agents
.github/workflows/test-audit.yml around line 109: the workflow uses
actions/checkout@v3 which is deprecated; update the step to use
actions/checkout@v4 by changing the version reference to v4 (or the latest
stable minor tag), and run a quick CI trigger to verify no breaking changes in
checkout behavior (ensure any inputs used by the step remain compatible with
v4).

Comment on lines +19 to +25
uses: actions/checkout@v3

- name: Set up QEMU
uses: docker/setup-qemu-action@v2

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Update GitHub Actions to latest versions.

Multiple actions use deprecated runners that will soon be unsupported. Update to the latest stable versions to ensure compatibility with GitHub Actions infrastructure.

Apply this diff:

-      - name: Checkout repository
-        uses: actions/checkout@v3
+      - name: Checkout repository
+        uses: actions/checkout@v4

       - name: Set up QEMU
-        uses: docker/setup-qemu-action@v2
+        uses: docker/setup-qemu-action@v3

       - name: Set up Docker Buildx
-        uses: docker/setup-buildx-action@v2
+        uses: docker/setup-buildx-action@v3
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
uses: actions/checkout@v3
- name: Set up QEMU
uses: docker/setup-qemu-action@v2
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
uses: actions/checkout@v4
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
🧰 Tools
🪛 actionlint (1.7.7)

19-19: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


22-22: the runner of "docker/setup-qemu-action@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


25-25: the runner of "docker/setup-buildx-action@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🤖 Prompt for AI Agents
In .github/workflows/v3_audit-prod.yml around lines 19 to 25, the workflow
references older action versions; update them to the latest stable releases to
avoid deprecation issues: change actions/checkout@v3 to actions/checkout@v4,
ensure docker/setup-qemu-action is pinned to its current stable tag (v2) and
bump docker/setup-buildx-action from v2 to v3 (or the latest stable release),
then run the workflow to verify compatibility and adjust any inputs if the newer
actions changed parameters.

Comment on lines +27 to +32
- name: Log into registry ${{ env.REGISTRY }}
uses: docker/login-action@v2
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Update docker/login-action to v3.

The login action is also using a deprecated runner.

       - name: Log into registry ${{ env.REGISTRY }}
-        uses: docker/login-action@v2
+        uses: docker/login-action@v3
         with:
           registry: ${{ env.REGISTRY }}
           username: ${{ github.actor }}
           password: ${{ secrets.GITHUB_TOKEN }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Log into registry ${{ env.REGISTRY }}
uses: docker/login-action@v2
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Log into registry ${{ env.REGISTRY }}
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
🧰 Tools
🪛 actionlint (1.7.7)

28-28: the runner of "docker/login-action@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🤖 Prompt for AI Agents
In .github/workflows/v3_audit-prod.yml lines 27-32, the workflow uses
docker/login-action@v2 which is outdated and relies on a deprecated runner;
update the action to docker/login-action@v3 by changing the uses line to
docker/login-action@v3 and ensure the with block keeps the registry, username
and password inputs (adjust any input names to v3 if needed), and also update
the workflow's runner configuration if it's using a deprecated runner image so
it targets a supported runner.

Comment on lines +34 to +47
- name: Extract Docker metadata
id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}

- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: .
file: apps/audit/Dockerfile
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Update docker actions to latest versions.

Both metadata-action and build-push-action need version updates.

       - name: Extract Docker metadata
         id: meta
-        uses: docker/metadata-action@v4
+        uses: docker/metadata-action@v5
         with:
           images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}

       - name: Build and push Docker image
-        uses: docker/build-push-action@v5
+        uses: docker/build-push-action@v6
         with:
           context: .
           file: apps/audit/Dockerfile
           push: true
           tags: ${{ steps.meta.outputs.tags }}
           labels: ${{ steps.meta.outputs.labels }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Extract Docker metadata
id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: .
file: apps/audit/Dockerfile
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
- name: Extract Docker metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
- name: Build and push Docker image
uses: docker/build-push-action@v6
with:
context: .
file: apps/audit/Dockerfile
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
🧰 Tools
🪛 actionlint (1.7.7)

36-36: the runner of "docker/metadata-action@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🤖 Prompt for AI Agents
.github/workflows/v3_audit-prod.yml lines 34-47: update the docker GitHub
Actions to their current major releases — change docker/metadata-action@v4 to
the latest supported tag (e.g., docker/metadata-action@v5) and bump
docker/build-push-action@v5 to the current major release (e.g.,
docker/build-push-action@v6); ensure the with/inputs remain compatible with the
newer action versions and test the workflow to confirm tags/labels outputs are
unchanged or adjust output references if the action output names changed.

Comment thread .gitignore
.idea/
tsconfig.tsbuildinfo

*.log
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove duplicate ignore rule.

Line 38 adds *.log, but line 50 already ignores *.log. Remove this duplicate entry.

-*.log
-
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
*.log
🤖 Prompt for AI Agents
In .gitignore around line 38, there is a duplicate ignore rule "*.log" (the same
rule also exists at line 50); remove the redundant "*.log" entry at line 38 so
the file contains only one "*.log" ignore rule (keep the existing one at line 50
and delete the duplicate at line 38).

Comment on lines +508 to 511
console.debug(
`Auto-accepted ${acceptedCount} invitations for new participant with emails:`,
emailCollection.allEmails
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

PII exposure: email collection logged at debug level.

The debug log includes emailCollection.allEmails, which contains user email addresses (PII). This is the same issue as the previous occurrences.

Apply this diff to log only the count:

-      console.debug(
-        `Auto-accepted ${acceptedCount} invitations for new participant with emails:`,
-        emailCollection.allEmails
-      )
+      console.debug(
+        `Auto-accepted ${acceptedCount} invitations for new participant (${emailCollection.allEmails.length} email(s))`
+      )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
console.debug(
`Auto-accepted ${acceptedCount} invitations for new participant with emails:`,
emailCollection.allEmails
)
console.debug(
`Auto-accepted ${acceptedCount} invitations for new participant (${emailCollection.allEmails.length} email(s))`
)
🤖 Prompt for AI Agents
In apps/auth/src/lib/helpers.ts around lines 508 to 511, the debug log currently
prints emailCollection.allEmails (PII); change the log to output only the
acceptedCount (no email addresses). Replace the console.debug call to log a
message containing the number of auto-accepted invitations (and optional non-PII
context), and remove any reference to or interpolation of
emailCollection.allEmails to avoid PII exposure.

redisAssessmentExec,
redisCache,
handlers,
auditClient,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify prepareHatchetTasks accepts optional AuditClient.

After making auditClient optional (to prevent startup crashes), ensure that prepareHatchetTasks is typed to accept auditClient?: AuditClient and handles the undefined case gracefully.

Run this script to check the signature:


🏁 Script executed:

#!/bin/bash
# Description: Verify prepareHatchetTasks signature accepts optional auditClient

# Search for the prepareHatchetTasks function signature
ast-grep --pattern $'export function prepareHatchetTasks($$$) {
  $$$
}'

# Also search for type definitions
rg -nP --type=ts 'prepareHatchetTasks.*\(' -A 5

Length of output: 19871


Make auditClient optional in prepareHatchetTasks signature and handle undefined.

At packages/hatchet/src/index.ts:34, change auditClient: AuditClient to auditClient?: AuditClient. Then at line 58, guard the usage:

if (auditClient) {
  await auditClient.log({
    ...message,
    scope: AuditScope.WORKER,
  })
}

This prevents type errors when the caller passes undefined and ensures the createAuditLogEntry task gracefully skips logging when the client is unavailable.

🤖 Prompt for AI Agents
In apps/hatchet-worker-general/src/index.ts around line 132 (and update the
source signature at packages/hatchet/src/index.ts line 34), make the
prepareHatchetTasks parameter auditClient optional (auditClient?: AuditClient)
and update any call sites accordingly; then in the createAuditLogEntry task
where auditClient is used (around line 58), add a guard so the log call runs
only when auditClient is defined (e.g., if (auditClient) { await
auditClient.log({...}) }), ensuring the task skips logging when auditClient is
undefined.

Comment on lines +36 to +38
await ctx.auditClient.log({
subject: `user:${ctx.user.sub}`,
action: AuditAction.USER_LOGOUT,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add error handling for audit logging to prevent breaking user flows.

All audit logging calls use await ctx.auditClient.log() without try-catch blocks. If the audit service is unavailable or throws an error, user operations (login, logout) will fail. Audit logging should typically be fire-and-forget or have explicit error handling to avoid disrupting primary flows.

Add error handling around audit calls:

try {
  await ctx.auditClient.log({
    subject: `participant:${participant.id}`,
    action: AuditAction.PARTICIPANT_LOGIN_SUCCESS,
    attributes: { method: 'password' },
  })
} catch (error) {
  console.error('Audit logging failed:', error)
  // Optionally: send to monitoring/alerting
}

Or, if auditClient supports a non-throwing fire-and-forget mode, use that:

// Don't await if audit client queues asynchronously
ctx.auditClient.log({...}).catch(err => console.error('Audit failed:', err))

Verify the auditClient's error handling behavior and decide on the appropriate pattern. Based on learnings.

Also applies to: 120-122, 137-139, 367-369, 469-471, 901-903, 930-932, 976-978

🤖 Prompt for AI Agents
In packages/graphql/src/services/accounts.ts (apply to lines 36-38, 120-122,
137-139, 367-369, 469-471, 901-903, 930-932, 976-978): auditClient.log is
awaited directly and can throw, which may break primary user flows; change these
calls to either fire-and-forget (remove await and attach .catch to log/report
errors) or wrap each await in a try-catch that logs the error (and optionally
reports to monitoring) so audit failures do not block login/logout or other user
operations.

Comment on lines +118 to +131
// Log failed login attempt - participant not found
if (!participant) {
await ctx.auditClient.log({
subject: `participant:${usernameOrEmail}`,
action: AuditAction.PARTICIPANT_LOGIN_FAILED,
attributes: {
method: 'password',
reason: 'user_not_found',
ip: ctx.req?.ip,
userAgent: ctx.req?.headers?.['user-agent'],
},
})
return null
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Sanitize or hash unknown identifiers in audit log subjects.

Line 121 uses unsanitized user input (usernameOrEmail) in the subject field. If this contains special characters, extremely long strings, or sensitive data, it could cause issues in audit log queries or expose information.

Consider one of these approaches:

Option 1: Use a hash

-      subject: `participant:${usernameOrEmail}`,
+      subject: `participant:unknown:${createHash('sha256').update(usernameOrEmail).digest('hex')}`,

Option 2: Use a generic placeholder and move the identifier to attributes

-      subject: `participant:${usernameOrEmail}`,
+      subject: `participant:unknown`,
       action: AuditAction.PARTICIPANT_LOGIN_FAILED,
       attributes: {
+        attemptedIdentifier: usernameOrEmail.substring(0, 100), // truncated
         method: 'password',

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In packages/graphql/src/services/accounts.ts around lines 118–131, the audit log
uses the raw usernameOrEmail in the subject which can include special chars,
very long strings, or sensitive data; replace the raw identifier with a
sanitized value: compute a stable hash (e.g. SHA-256 hex) of usernameOrEmail and
use subject `participant:<hash>`, move the original identifier into attributes
only after sanitizing/truncating or masking (or omit raw entirely and store a
boolean/placeholder), ensure consistent hashing method so logs remain queryable,
and enforce a max length on any attribute to avoid extremely long values.

Comment on lines +372 to +374
const auditClient = new AuditClient()

return { prisma, hatchet: hatchetClient, emitter, auditClient }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Ensure AuditClient initialization doesn't break tests when audit service is unavailable.

The AuditClient constructor requires AUDIT_SERVICE_URL and AUDIT_TOKEN environment variables and will throw if they're missing. In test environments where the audit service may not be available, this will cause all test initialization to fail with a misleading "Database connection failed" error message.

Consider one of these approaches:

  1. Graceful fallback: Wrap AuditClient construction in a try-catch and provide a no-op stub:
+    let auditClient: AuditClient
+    try {
+      auditClient = new AuditClient()
+    } catch (error) {
+      console.warn('AuditClient initialization failed, using stub:', error)
+      auditClient = {
+        log: async () => {
+          console.warn('Audit logging disabled in tests')
+          return null
+        }
+      } as AuditClient
+    }
-    const auditClient = new AuditClient()
  1. Environment validation: Check for required environment variables before constructing AuditClient and fail with a clear message if missing.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In packages/graphql/test/helpers.ts around lines 372 to 374, constructing
AuditClient unconditionally causes tests to crash when
AUDIT_SERVICE_URL/AUDIT_TOKEN are missing; wrap the AuditClient construction in
a short guard: either check process.env.AUDIT_SERVICE_URL and
process.env.AUDIT_TOKEN before instantiating and throw a clear error if you want
strictness, or better for tests use a try-catch and on failure return a no-op
stub object that implements the same methods used in tests (or null with callers
adjusted) so test initialization doesn't bubble up an unrelated "Database
connection failed" error.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (13)
packages/util/src/jwt.ts (1)

12-14: LGTM with a minor observation on union types.

The addition of liveQuizId, instanceId, and execution fields aligns with the audit correlation tracking requirements. The fields are optional and integrate cleanly with the existing JWTPayload interface.

One observation: instanceId and execution use union types (string | number). If the actual usage consistently treats these as numbers, consider narrowing the types to avoid unnecessary type guards downstream.

apps/audit/test/performance.test.ts (4)

106-122: Review performance thresholds for environment sensitivity.

The 100ms threshold for simple events is reasonable, but may be sensitive to CI/test environment performance (network latency to Azurite, CPU contention, etc.). If tests become flaky in CI, consider:

  • Relaxing thresholds slightly or making them configurable via environment variables
  • Using percentile-based assertions (e.g., p95 < 150ms) instead of absolute values

341-341: Throughput variance threshold of 250% is extremely permissive.

A throughput variance of 250% means the system can have a 3.5x difference between min and max throughput, which suggests very inconsistent performance. For a production audit service, this variance is quite high and may mask performance degradation issues.

Consider tightening this threshold (e.g., 50-100%) to catch performance regressions earlier, or document why such high variance is acceptable.


359-432: Memory leak test relies on optional global.gc which may not be available.

Lines 373-375 and 394-396 conditionally invoke global.gc() if available. In standard Node.js, gc is only exposed when running with --expose-gc flag. Without it, garbage collection timing is unpredictable, reducing test reliability.

Consider documenting this requirement in test setup instructions or adjusting the test to be more robust when GC is unavailable (e.g., longer settle times, or skip/warn when GC is not exposed).


105-527: Comprehensive performance test suite with good coverage.

The test suite covers a wide range of performance scenarios (single request, concurrency, sustained load, memory, error rates, and DB scale). The use of measurePerformance helper and detailed console logging aids in diagnosing performance issues.

A few recommended improvements:

  • Make hard-coded thresholds configurable or document their assumptions
  • Consider adding assertions on response latency percentiles (p50, p95, p99) for more nuanced performance tracking
  • Reduce sleep/wait times where possible to speed up test execution

Overall, this is a solid foundation for load testing the audit service.

apps/audit/test/integration.test.ts (4)

12-28: Improve error messages in extractEntityTimestamp.

The function throws generic error messages that don't include the entity or context. When debugging test failures, more descriptive errors would help identify which entity caused the issue.

Consider including entity identifiers (rowKey, partitionKey) in the error messages.

Apply this diff to improve error handling:

 function extractEntityTimestamp(entity: any): number {
   const raw = entity.eventTimestamp ?? entity.timestamp

   if (typeof raw === 'number') {
     return raw
   }

   if (typeof raw === 'string') {
     const parsed = Date.parse(raw)
     if (Number.isNaN(parsed)) {
-      throw new Error(`Unable to parse timestamp value: ${raw}`)
+      throw new Error(`Unable to parse timestamp value: ${raw} for entity ${entity.rowKey}`)
     }
     return parsed
   }

-  throw new Error('Timestamp not found on entity')
+  throw new Error(`Timestamp not found on entity ${entity.rowKey}`)
 }

47-65: Code duplication: waitForEntityByRowKey reimplements AzureTableTestHelper.waitForEntity.

The waitForEntityByRowKey helper iterates over all entities to find a match by rowKey, but AzureTableTestHelper already provides waitForEntity(partitionKey, rowKey) which is more efficient.

If you need to search by rowKey alone without knowing the partition key, consider adding a dedicated helper method to AzureTableTestHelper to avoid duplicating the polling logic.


125-125: Fixed sleeps slow down test execution.

Multiple tests use fixed setTimeout delays (500ms to 5000ms) to wait for persistence. These add up to significant test execution time. Consider:

  • Reducing sleep durations where safe
  • Using the existing waitForEntity or waitForEntityCount helpers more consistently (they poll with shorter intervals and fail fast on timeout)
  • Making wait times configurable for local vs. CI environments

Also applies to: 179-179, 255-255, 300-300, 342-342, 382-382, 429-429, 476-476, 496-496, 521-521


83-504: Comprehensive integration test suite with strong coverage.

The test suite thoroughly exercises the audit service across multiple dimensions: persistence, idempotency, isolation, event types, partitioning, payload size, and timestamp handling. The tests are well-structured and validate both HTTP responses and database state.

A few refinements would improve maintainability and speed:

  • Reduce fixed sleep times by relying more on polling helpers
  • Eliminate code duplication in helper functions
  • Add more descriptive test names where applicable (e.g., line 220)

Overall, this provides excellent end-to-end verification of the audit service.

apps/audit/src/schemas/audit-event.ts (1)

62-119: Consider requiring correlation data for PUBLIC events to improve traceability.

Currently, PUBLIC-scope events may omit both correlationId and correlationClaims. A previous review suggested requiring at least one for improved event correlation and traceability.

If this is still the desired behavior, add this validation inside the superRefine (after line 74):

      })
    }
+
+    // For PUBLIC events, require either correlationId or correlationClaims
+    if (scope === AuditScope.PUBLIC) {
+      if (!val.correlationId && !val.correlationClaims) {
+        ctx.addIssue({
+          code: z.ZodIssueCode.custom,
+          message:
+            'PUBLIC events must include correlationId or correlationClaims for traceability',
+          path: ['correlationId'],
+        })
+      }
+    }

    if (

This ensures PUBLIC events can be traced back to their originating context.

apps/audit/test/klicker-scenarios.test.ts (1)

38-67: Replace fixed delay with dynamic wait to improve test reliability.

The 2-second fixed delay (line 58) could cause flakiness in CI or under load. The AzureTableTestHelper already provides waitForEntityCount with configurable timeout.

Replace the fixed delay with:

-  // Wait for persistence
-  await new Promise((resolve) => setTimeout(resolve, 2000))
+  // Wait for persistence with timeout
+  await tableHelper.waitForEntityCount(events.length, 10000, 200)

This will fail fast if entities aren't persisted within 10 seconds, rather than always waiting 2 seconds.

packages/types/src/index.ts (2)

972-976: Standardize instanceId and execution to single types.

CorrelationClaims defines instanceId and execution as string | number, creating type ambiguity. Consumers must perform runtime checks or normalization, increasing complexity and the risk of type-related bugs.

Consider standardizing to a single type (e.g., string or number) across the codebase. If both are genuinely required, document the rationale and provide utility functions for safe conversion.


1088-1109: Consider flexible status types for audit responses.

AuditResponse and AuditBatchResponse both fix status: 'stored' as a literal type. If future requirements include partial failures, retries, or other states (e.g., 'pending', 'failed', 'partial'), this design will require breaking changes.

Consider using a union type for status:

export interface AuditResponse {
  status: 'stored' | 'pending' | 'failed'
  stored: boolean
  eventId: string
  error?: string
}

Alternatively, document that the current design assumes all responses are successfully stored and any failures are handled via exceptions.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4fa8fb4 and 554abe7.

📒 Files selected for processing (11)
  • .github/workflows/claude-code-review.yml (1 hunks)
  • apps/audit/src/routes/audit-public.ts (1 hunks)
  • apps/audit/src/schemas/audit-event.ts (1 hunks)
  • apps/audit/src/storage/entities.ts (1 hunks)
  • apps/audit/test/fixtures/klicker-events.json (1 hunks)
  • apps/audit/test/integration.test.ts (1 hunks)
  • apps/audit/test/klicker-scenarios.test.ts (1 hunks)
  • apps/audit/test/performance.test.ts (1 hunks)
  • apps/response-api/src/index.ts (9 hunks)
  • packages/types/src/index.ts (1 hunks)
  • packages/util/src/jwt.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • apps/audit/src/storage/entities.ts
  • .github/workflows/claude-code-review.yml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-23T12:32:29.344Z
Learnt from: CR
PR: uzh-bf/klicker-uzh#0
File: apps/frontend-manage/CLAUDE.md:0-0
Timestamp: 2025-06-23T12:32:29.344Z
Learning: Implement error boundaries and fallback UI to handle unexpected errors gracefully in the user interface.

Applied to files:

  • apps/audit/src/schemas/audit-event.ts
🧬 Code graph analysis (6)
apps/audit/test/klicker-scenarios.test.ts (1)
apps/audit/test/utils/azure-table-helper.ts (1)
  • AzureTableTestHelper (37-320)
apps/audit/src/routes/audit-public.ts (8)
apps/audit/src/storage/table-client.ts (1)
  • AuditTableClient (5-286)
apps/audit/src/utils/metrics.ts (1)
  • metrics (10-35)
apps/audit/src/utils/logger.ts (1)
  • logger (4-26)
apps/audit/src/utils/cookies.ts (1)
  • parseCookies (1-24)
apps/audit/src/middleware/auth.ts (1)
  • verifyParticipantToken (39-71)
apps/audit/src/utils/rate-limit.ts (1)
  • checkRateLimit (8-35)
apps/audit/src/schemas/audit-event.ts (1)
  • AuditEventSchema (30-119)
apps/audit/src/storage/entities.ts (1)
  • createAuditEntity (46-73)
apps/response-api/src/index.ts (1)
packages/types/src/index.ts (2)
  • InternalAuditEvent (992-996)
  • CorrelationClaims (972-976)
apps/audit/src/schemas/audit-event.ts (1)
packages/types/src/index.ts (1)
  • CorrelationClaims (972-976)
apps/audit/test/performance.test.ts (1)
apps/audit/test/utils/azure-table-helper.ts (1)
  • AzureTableTestHelper (37-320)
apps/audit/test/integration.test.ts (1)
apps/audit/test/utils/azure-table-helper.ts (1)
  • AzureTableTestHelper (37-320)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: build
  • GitHub Check: format
  • GitHub Check: cypress-run-cloud
  • GitHub Check: build
  • GitHub Check: build
  • GitHub Check: SonarCloud
  • GitHub Check: build
  • GitHub Check: build
  • GitHub Check: Analyze (javascript)
  • GitHub Check: build
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: check
🔇 Additional comments (40)
apps/audit/test/fixtures/klicker-events.json (1)

1-313: Comprehensive test fixture with good event coverage.

The fixture provides a wide range of realistic audit events covering instructor controls, participant authentication flows, quiz interactions, system processing, security events, and error scenarios. This will enable thorough testing of event handling, correlation, and persistence logic.

The structure is clear and consistent across all event types, making it easy to extend with additional scenarios as needed.

apps/audit/src/schemas/audit-event.ts (6)

8-16: LGTM: Comprehensive public action allowlist.

The ALLOWED_PUBLIC_ACTIONS array covers key participant interactions and client error reporting, providing a clear security boundary for public-scope events.


18-22: LGTM: Well-defined constants.

The Set for O(1) action lookups, 32KB attribute limit (matching Azure Table Storage constraints), and correlation ID regex are all appropriate.


24-28: LGTM: Schema matches interface contract.

CorrelationClaimsSchema correctly validates the CorrelationClaims interface from packages/types, with appropriate union types for instanceId and execution.


87-106: LGTM: Robust attribute validation with clear error messages.

The validation correctly enforces JSON serializability and the 32KB size limit, with helpful error messages indicating current size on limit violations.


108-118: LGTM: Correlation claims serializability check.

The validation ensures correlationClaims can be JSON-serialized before storage, with a clear error path.


46-46: Subject must be optional to allow PUBLIC-scope events.

The base schema requires subject, but PUBLIC-scope events need to allow omission since they're enriched server-side. The superRefine at lines 76-85 attempts to enforce subject presence for non-PUBLIC scopes, but it cannot relax a field already marked as required.

Apply this diff:

-    subject: z.string().min(1).max(500),
+    subject: z.string().min(1).max(500).optional(),

The existing superRefine will then correctly enforce subject presence for non-PUBLIC scopes only.

apps/audit/test/klicker-scenarios.test.ts (8)

1-26: LGTM: Clean test setup with proper authentication helper.

The test configuration and authenticated request helper are well-structured, reading credentials from environment variables.


69-85: LGTM: Robust timestamp extraction with proper error handling.

The helper correctly handles multiple timestamp formats (number, ISO string) and provides clear error messages for invalid values.


87-101: LGTM: Proper test lifecycle management.

The setup and cleanup hooks ensure test isolation by clearing the table before and after tests.


103-316: LGTM: Comprehensive workflow test with strong assertions.

This test validates the complete quiz lifecycle including multiple authentication methods, response processing, and correlation tracking. The timestamp ordering and correlation verification assertions are particularly valuable.


318-447: LGTM: Thorough modification tracking tests.

The tests correctly verify that response modifications and deletions capture the actor, reason, and original subject, ensuring a complete audit trail.


449-534: LGTM: Complete authentication scenario coverage.

The tests validate successful and failed authentication flows across multiple methods (magic link, LTI, temp login), ensuring audit events capture the full authentication lifecycle.


536-658: LGTM: Strong exam integrity monitoring coverage.

The tests validate detection and logging of security-relevant events (multiple tabs, focus loss, IP changes), which are critical for assessment integrity.


660-827: LGTM: Comprehensive error handling test coverage.

The tests validate error scenarios including validation failures, duplicates, correlation errors, and client errors, ensuring the audit system handles edge cases gracefully.

packages/types/src/index.ts (3)

978-996: LGTM: Clean audit event type hierarchy.

The base/derived event structure cleanly separates public and internal scopes, with InternalAuditEvent appropriately requiring subject while PublicAuditEvent fixes the scope. The flexible attributes field allows for context-specific data.


998-1002: LGTM: AuditScope enum is well-defined.

The three scopes (PUBLIC, INTERNAL, WORKER) provide clear separation for different audit contexts and filtering needs.


1004-1086: Track implementation progress for audit actions.

All AuditAction enum values are marked with // TODO comments, indicating incomplete implementation. Additionally, the security events (MULTIPLE_TABS_DETECTED, BROWSER_FOCUS_LOST, IP_LOCATION_CHANGE) include a note expressing uncertainty about implementation.

Ensure that:

  1. Implementation status is tracked for all audit actions
  2. Security event detection mechanisms are clarified before full rollout
  3. Remove TODO comments as actions are implemented and tested

Consider creating a tracking issue to monitor completion of audit action implementation.

apps/response-api/src/index.ts (11)

281-286: LGTM: Correlation claims normalization.

The code constructs validatedCorrelationClaims after successful validation, providing a typed structure for downstream audit events. Defaulting execution to 0 when undefined is a reasonable fallback.

Ensure that the default execution: 0 aligns with business logic expectations.


176-186: Critical: Don't stringify req; add scope; make push non-blocking.

This audit event logs the entire IncomingMessage object via JSON.stringify(req), which leaks headers/cookies and can be circular or huge. Additionally, the scope field is missing, and the blocking await can derail the error path if the audit push fails.

Apply the sanitization and error handling as suggested in the previous review comment at lines 176-187.


192-202: Major: Avoid logging full body; add scope and guard push.

The audit event embeds the raw payload via JSON.stringify(payload), which may contain sensitive data. The scope field is also missing, and the push is blocking.

Apply the sanitization as suggested in the previous review comment at lines 192-203.


214-226: Major: Do not log raw JWT (correlationKey); hash it and remove full response.

The audit event logs the raw correlationKey (a JWT) and the full response object, both of which are sensitive. The scope field is missing, and the push is blocking.

Apply the hashing and sanitization as suggested in the previous review comment at lines 214-227. Note that correlationKey is not the same as the computed correlationId (line 218 comment).


240-252: Major: Sanitize correlation error log; hash token; no full payload.

Raw correlationKey and full response are logged. The scope field is missing, and the push is blocking.

Apply the hashing and sanitization as suggested in the previous review comment at lines 240-253.


262-276: Major: Invalid correlation log still leaks token; hash and trim details.

Raw correlationKey and full response are exposed in the audit event. The scope field is missing, and the push is blocking.

Apply the hashing and sanitization as suggested in the previous review comment at lines 262-277.


313-326: Major: Auth error - hash correlationId; don't log payload.

Raw correlationKey and full response are logged. The scope field is missing, and the push is blocking.

Apply the hashing and sanitization as suggested in the previous review comment at lines 313-327.


334-347: Major: Don't log raw cookies; hash correlationId; add minimal flags.

The audit event logs raw cookie strings (PII) and the full response. The scope field is missing, and the push is blocking.

Apply the sanitization as suggested in the previous review comment at lines 334-348, replacing raw cookies with boolean flags.


361-374: Major: Use hashed correlationId; avoid logging full response.

This is the only audit event with scope: AuditScope.INTERNAL (good!), but it still uses the raw correlationKey instead of the computed hashed correlationId (line 358). The full response object is logged, and the push is blocking.

Apply the changes suggested in the previous review comment at lines 361-374, using the hashed correlationId and a response fingerprint instead of the raw payload. Based on past comment at line 366.


385-397: Major: Duplicate - don't log raw token; use hashed id consistently.

The audit event uses raw correlationKey in the correlationId field but references the hashed correlationId in the error message, creating confusion. The scope field is missing, and the push is blocking.

Apply the changes suggested in the previous review comment at lines 385-397, using the hashed correlationId consistently and adding the scope field.


425-437: Major: Fallback audit - avoid raw token and standardize attributes.

The fallback audit event uses raw correlationKey instead of the hashed correlationId. The scope field is missing, and this push also lacks .catch() (though there is an outer try/catch).

Apply the changes suggested in the previous review comment at lines 425-437, using the hashed correlationId and adding the scope field with structured attributes.

apps/audit/src/routes/audit-public.ts (11)

1-15: LGTM!

Imports and dependency interface are well-structured and appropriate for the route handler.


17-24: LGTM!

Function signature follows good dependency injection practices, and initial metrics tracking is appropriate.


62-82: LGTM!

Rate limiting implementation is correct. As noted in previous discussions, the 100 requests/minute threshold may need tuning based on actual usage patterns.


84-97: LGTM!

JSON parsing with proper error handling and appropriate 400 response for malformed input.


99-123: LGTM!

Payload normalization correctly enforces security boundaries by ignoring client-provided subject and scope, and enriches the event with authenticated participant context.


125-144: LGTM!

Schema validation is properly implemented with safeParse, appropriate error logging, and user-friendly error messages. Correctly marks validation errors as non-retryable.


146-155: LGTM!

Optional correlation ID logging provides useful tracing capability for related events.


157-167: LGTM!

Comprehensive logging before storage provides good audit trail and debugging support.


169-194: LGTM!

Storage operation is clean and properly instrumented. Metrics are updated after success, and the response provides the eventId for client-side correlation.


195-251: LGTM!

Error handling is comprehensive and user-friendly. Properly classifies errors by type (size limits vs. transient failures) and provides appropriate HTTP status codes with retry guidance. Error patterns align with Azure Table Storage failure modes.


36-47: Accept alternative participant token cookies.

The endpoint currently only accepts next-auth.participant-session-token. As noted in previous reviews, temporary participants may use different cookie names (e.g., temporary_participant_token), preventing their actions from being audited.

…-log

# Conflicts:
#	packages/graphql/test/docker/docker-compose.test.yml
#	packages/graphql/test/helpers.ts
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (9)
apps/audit/.env.cypress (1)

19-19: Weak secrets are acceptable for Cypress tests but document the constraint.

The simple values test-secret-token-123 (line 19) and abcd (line 22) are appropriate for a .env.cypress test file. However, consider adding a comment emphasizing these are test-only values and must never be used in production environments.

Apply this diff to add clarifying comments:

 # Authentication Configuration
+# WARNING: Test-only value. Use strong secrets in production.
 AUDIT_TOKEN=test-secret-token-123
 
 # JWT verification for participant tokens (public endpoint)
+# WARNING: Test-only value. Use strong secrets in production.
 APP_SECRET=abcd

Also applies to: 22-22

apps/audit/test/integration.test.ts (1)

12-65: Extract shared test helpers to reduce duplication.

The functions extractEntityTimestamp, makeAuthenticatedRequest, and waitForEntityByRowKey are duplicated across multiple test files (integration.test.ts, performance.test.ts, and database-verification.test.ts).

Consider extracting these to a shared test utilities file (e.g., apps/audit/test/utils/test-helpers.ts) to improve maintainability and ensure consistency.

Create a new file apps/audit/test/utils/test-helpers.ts:

export function extractEntityTimestamp(entity: any): number {
  const raw = entity.eventTimestamp ?? entity.timestamp

  if (typeof raw === 'number') {
    return raw
  }

  if (typeof raw === 'string') {
    const parsed = Date.parse(raw)
    if (Number.isNaN(parsed)) {
      throw new Error(`Unable to parse timestamp value: ${raw}`)
    }
    return parsed
  }

  throw new Error('Timestamp not found on entity')
}

export async function makeAuthenticatedRequest(
  baseUrl: string,
  authToken: string,
  path: string,
  options: RequestInit = {}
): Promise<Response> {
  const url = `${baseUrl}${path}`
  const response = await fetch(url, {
    ...options,
    headers: {
      'X-Internal-Token': authToken,
      'Content-Type': 'application/json',
      ...options.headers,
    },
  })
  return response
}

export async function waitForEntityByRowKey(
  tableHelper: AzureTableTestHelper,
  rowKey: string,
  timeoutMs = 5000,
  pollIntervalMs = 100
) {
  const start = Date.now()

  while (Date.now() - start < timeoutMs) {
    const entities = await tableHelper.getAllEntities()
    const match = entities.find((entity) => entity.rowKey === rowKey)
    if (match) {
      return match
    }

    await new Promise((resolve) => setTimeout(resolve, pollIntervalMs))
  }

  throw new Error(`Entity with rowKey ${rowKey} not found after ${timeoutMs}ms`)
}

Then import and use these in all test files.

apps/audit/test/performance.test.ts (7)

125-125: Consider using robust wait mechanisms for persistence verification.

Lines 125 and 179 use fixed delays to wait for persistence. While these occur after performance measurements (so they don't affect timing), they could still lead to flaky tests. Consider using tableHelper.waitForEntityCount() or a similar polling mechanism.

For example, at line 125:

-      await new Promise((resolve) => setTimeout(resolve, 500))
+      await tableHelper.waitForEntityCount(1, 5000)

Also applies to: 179-179


259-259: Use waitForEntityCount instead of fixed delay.

Replace the fixed 5-second delay with tableHelper.waitForEntityCount() to robustly wait for entity persistence.

-      await new Promise((resolve) => setTimeout(resolve, 5000)) // Extra wait time
+      await tableHelper.waitForEntityCount(successfulResponses.length, 20000)

Note: Adjust timeout as needed based on expected persistence time.


340-341: Performance variance threshold may be too permissive.

A 250% throughput variance allows maximum throughput to be 3.5× minimum, which may not effectively detect performance degradation over sustained load. Consider tightening this threshold (e.g., 50-100%) to catch more subtle performance issues.

-      // Performance should not degrade significantly over time
-      expect(throughputVariance).toBeLessThanOrEqual(250)
+      // Performance should not degrade significantly over time
+      expect(throughputVariance).toBeLessThanOrEqual(100)

348-348: Use waitForEntityCount instead of fixed delay.

Replace the fixed 3-second delay with tableHelper.waitForEntityCount(totalEvents, 15000) for robust persistence verification.

-      await new Promise((resolve) => setTimeout(resolve, 3000))
+      await tableHelper.waitForEntityCount(totalEvents, 15000)

392-392: Consider using waitForEntityCount for persistence verification.

Lines 392 and 425 use fixed delays. While line 392's delay might be intentional for iteration spacing, line 425 should use tableHelper.waitForEntityCount(totalExpectedEvents, 15000) for more reliable persistence verification.

-      await new Promise((resolve) => setTimeout(resolve, 2000))
+      await tableHelper.waitForEntityCount(totalExpectedEvents, 15000)

Also applies to: 425-425


476-476: Use waitForEntityCount instead of fixed delay.

Replace the fixed 2-second delay with tableHelper.waitForEntityCount(successCount, 10000) for robust persistence verification.

-      await new Promise((resolve) => setTimeout(resolve, 2000))
+      await tableHelper.waitForEntityCount(successCount, 10000)

496-496: Use waitForEntityCount instead of fixed delays.

Both delays should use tableHelper.waitForEntityCount() for more reliable persistence verification.

At line 496:

-      await new Promise((resolve) => setTimeout(resolve, 5000)) // Wait for population
+      await tableHelper.waitForEntityCount(populateCount, 20000)

At line 521:

-      await new Promise((resolve) => setTimeout(resolve, 1000))
+      await tableHelper.waitForEntityCount(existingCount + 1, 10000)

Also applies to: 521-521

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5d660f and 229c7e3.

📒 Files selected for processing (5)
  • apps/audit/.env.cypress (1 hunks)
  • apps/audit/test/api.test.ts (1 hunks)
  • apps/audit/test/database-verification.test.ts (1 hunks)
  • apps/audit/test/integration.test.ts (1 hunks)
  • apps/audit/test/performance.test.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/audit/test/api.test.ts
🧰 Additional context used
🧬 Code graph analysis (3)
apps/audit/test/performance.test.ts (1)
apps/audit/test/utils/azure-table-helper.ts (1)
  • AzureTableTestHelper (37-320)
apps/audit/test/database-verification.test.ts (1)
apps/audit/test/utils/azure-table-helper.ts (1)
  • AzureTableTestHelper (37-320)
apps/audit/test/integration.test.ts (1)
apps/audit/test/utils/azure-table-helper.ts (1)
  • AzureTableTestHelper (37-320)
🪛 dotenv-linter (3.3.0)
apps/audit/.env.cypress

[warning] 6-6: [UnorderedKey] The NODE_ENV key should go before the PORT key

(UnorderedKey)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: test
  • GitHub Check: build
  • GitHub Check: cypress-run-cloud
  • GitHub Check: build
  • GitHub Check: test
  • GitHub Check: Analyze (javascript)
  • GitHub Check: SonarCloud
  • GitHub Check: build
  • GitHub Check: build
  • GitHub Check: test
  • GitHub Check: format
  • GitHub Check: build
  • GitHub Check: check
🔇 Additional comments (1)
apps/audit/.env.cypress (1)

1-31: LGTM! Cypress test configuration is well-structured.

The environment configuration for Cypress testing is comprehensive and appropriate:

  • Uses Azurite's well-known default connection string for local testing
  • Includes clear comments distinguishing local vs. production setup
  • Covers all necessary configuration domains (server, storage, auth, CORS, logging)

responses.forEach((response) => expect(response.status).toBe(200))

// Wait and verify only one record exists
await new Promise((resolve) => setTimeout(resolve, 2000))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use waitForEntityCount instead of fixed delay.

Replace the fixed 2-second delay with tableHelper.waitForEntityCount() to reliably wait for persistence before verification.

-      await new Promise((resolve) => setTimeout(resolve, 2000))
+      await tableHelper.waitForEntityCount(1, 10000)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
await new Promise((resolve) => setTimeout(resolve, 2000))
await tableHelper.waitForEntityCount(1, 10000)
🤖 Prompt for AI Agents
In apps/audit/test/database-verification.test.ts around line 141, replace the
fixed 2-second delay (await new Promise(resolve => setTimeout(resolve, 2000)))
with a call to tableHelper.waitForEntityCount() so the test waits
deterministically for persistence: call
tableHelper.waitForEntityCount(expectedCount[, options]) using the expected
number of entities you expect to be persisted (and optionally a timeout/retry
options), then proceed to verification; remove the setTimeout and ensure
tableHelper is available/imported in the test scope.

expect(response.status).toBe(200)

// Wait for persistence and get the entity
await new Promise((resolve) => setTimeout(resolve, 1000))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use waitForEntityCount instead of fixed delays.

Multiple tests use fixed 1-2 second delays. Replace these with tableHelper.waitForEntityCount() for more reliable persistence verification.

For example, at line 169:

-      await new Promise((resolve) => setTimeout(resolve, 1000))
+      await tableHelper.waitForEntityCount(1, 10000)

Apply similar changes at lines 213 and 249 with appropriate expected counts.

Also applies to: 213-213, 249-249

🤖 Prompt for AI Agents
In apps/audit/test/database-verification.test.ts around lines 169, 213 and 249,
replace the fixed 1s setTimeout delays with calls to
tableHelper.waitForEntityCount(...) so the test waits for the expected number of
persisted rows rather than a fixed sleep; for each location determine the
expectedCount based on the number of entities inserted earlier in that test
section and call await tableHelper.waitForEntityCount('<tableName>',
expectedCount) (include an optional timeout argument if available) to make
persistence assertions robust and remove the hard-coded setTimeout lines.

}

// Verify row keys
await new Promise((resolve) => setTimeout(resolve, 1000))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use waitForEntityCount instead of fixed delays.

Replace fixed delays with tableHelper.waitForEntityCount() for robust persistence verification.

At line 287:

-      await new Promise((resolve) => setTimeout(resolve, 1000))
+      await tableHelper.waitForEntityCount(2, 10000)

At line 319:

-      await new Promise((resolve) => setTimeout(resolve, 1000))
+      await tableHelper.waitForEntityCount(1, 10000)

Also applies to: 319-319

🤖 Prompt for AI Agents
In apps/audit/test/database-verification.test.ts around lines 287 and 319, the
test uses fixed delays (await new Promise(resolve => setTimeout(resolve, 1000)))
which is brittle; replace each fixed delay with
tableHelper.waitForEntityCount(tableName, expectedCount[, timeout]) where
tableName and expectedCount match the table and assertion that follows those
waits (use the same expected count used later in the test), remove the
setTimeout calls, and await the tableHelper.waitForEntityCount promise so the
test reliably waits for persistence instead of a fixed delay.

expect(response.status).toBe(200)

// Verify serialization integrity
await new Promise((resolve) => setTimeout(resolve, 1000))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use waitForEntityCount instead of fixed delays.

Replace fixed 1-second delays with tableHelper.waitForEntityCount(1, 10000) for more reliable persistence verification.

-      await new Promise((resolve) => setTimeout(resolve, 1000))
+      await tableHelper.waitForEntityCount(1, 10000)

Also applies to: 396-396

🤖 Prompt for AI Agents
In apps/audit/test/database-verification.test.ts around lines 359 and 396,
replace the fixed 1-second Promise delay (await new Promise((resolve) =>
setTimeout(resolve, 1000))) with a call to tableHelper.waitForEntityCount(1,
10000) to wait deterministically for the entity to persist; ensure you await the
promise returned by waitForEntityCount and remove the setTimeout lines so the
test waits up to 10 seconds for one entity instead of a fixed delay.

}

// Verify timestamp storage
await new Promise((resolve) => setTimeout(resolve, 1000))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use waitForEntityCount instead of fixed delays.

Replace fixed 1-second delays with appropriate tableHelper.waitForEntityCount() calls.

At line 442:

-      await new Promise((resolve) => setTimeout(resolve, 1000))
+      await tableHelper.waitForEntityCount(2, 10000)

At line 479:

-      await new Promise((resolve) => setTimeout(resolve, 1000))
+      await tableHelper.waitForEntityCount(1, 10000)

Also applies to: 479-479

🤖 Prompt for AI Agents
In apps/audit/test/database-verification.test.ts around lines 442 and 479,
replace the fixed 1-second setTimeout pauses with calls to
tableHelper.waitForEntityCount() to wait deterministically for the expected
number of entities; for each location, call
tableHelper.waitForEntityCount(tableName, expectedCount, { timeoutMs:
<reasonableTimeout> }) (or the project’s standard timeout) so the test waits
until the DB contains the expected rows instead of using a hardcoded delay.

Comment on lines +254 to +256
// Wait for persistence
await new Promise((resolve) => setTimeout(resolve, 1000))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use robust wait mechanism instead of fixed delay.

The fixed 1-second delay may lead to flaky tests if persistence takes longer than expected. Consider using tableHelper.waitForEntityCount(2, 10000) instead to wait for a specific entity count with a timeout.

Apply this diff:

-      // Wait for persistence
-      await new Promise((resolve) => setTimeout(resolve, 1000))
+      // Wait for persistence
+      await tableHelper.waitForEntityCount(2, 10000)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Wait for persistence
await new Promise((resolve) => setTimeout(resolve, 1000))
// Wait for persistence
await tableHelper.waitForEntityCount(2, 10000)
🤖 Prompt for AI Agents
In apps/audit/test/integration.test.ts around lines 254 to 256, replace the
fixed 1-second sleep used to wait for persistence with a robust wait call that
polls until the expected entity count is reached or a timeout occurs;
specifically, remove the await new Promise(resolve => setTimeout(resolve, 1000))
and call tableHelper.waitForEntityCount(2, 10000) (or similar helper) so the
test waits up to 10s for two entities instead of a brittle fixed delay, ensuring
the appropriate helper is imported/available in the test file.

Comment on lines +299 to +300
// Wait for all events to be persisted
await new Promise((resolve) => setTimeout(resolve, 2000))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Replace fixed delays with robust wait mechanisms.

Multiple tests use fixed 2-second delays (lines 300, 342, 382) which may cause flaky tests. Use tableHelper.waitForEntityCount() with appropriate expected counts and timeouts instead.

For example, at line 300:

-      // Wait for all events to be persisted
-      await new Promise((resolve) => setTimeout(resolve, 2000))
+      // Wait for all events to be persisted
+      await tableHelper.waitForEntityCount(3, 10000)

Apply similar changes at lines 342 and 382.

Also applies to: 341-342, 381-382

🤖 Prompt for AI Agents
In apps/audit/test/integration.test.ts around lines 299-300, 341-342 and
381-382, replace the fixed 2-second setTimeout waits used to "Wait for all
events to be persisted" with calls to tableHelper.waitForEntityCount() using the
appropriate expected row counts for the test case and a sensible timeout (e.g.,
5-10s). Specifically, remove the new Promise(setTimeout(…,2000)) and instead
call await tableHelper.waitForEntityCount(tableName, expectedCount, timeoutMs)
(or the test suite's equivalent), choosing expectedCount based on how many
events the test should have persisted and a timeout long enough to avoid flakes;
apply the same replacement at the other two locations.

Comment on lines +428 to +429
// Wait for persistence
await new Promise((resolve) => setTimeout(resolve, 3000))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use robust wait mechanism instead of fixed delay.

Replace the fixed 3-second delay with tableHelper.waitForEntityCount(5, 10000) to ensure all events are persisted before verifying partition distribution.

-      // Wait for persistence
-      await new Promise((resolve) => setTimeout(resolve, 3000))
+      // Wait for persistence
+      await tableHelper.waitForEntityCount(5, 10000)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Wait for persistence
await new Promise((resolve) => setTimeout(resolve, 3000))
// Wait for persistence
await tableHelper.waitForEntityCount(5, 10000)
🤖 Prompt for AI Agents
In apps/audit/test/integration.test.ts around lines 428 to 429, replace the
brittle fixed 3-second sleep used to wait for persistence with a robust waiter;
call tableHelper.waitForEntityCount(5, 10000) instead of awaiting new
Promise(resolve => setTimeout(resolve, 3000)) so the test waits up to 10s for 5
persisted entities before proceeding to verify partition distribution.

…-log

# Conflicts:
#	.github/workflows/test-graphql.yml
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
11 Security Hotspots
3.3% Duplication on New Code (required ≤ 3%)
D Reliability Rating on New Code (required ≥ A)
C Security Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/test-graphql.yml (1)

163-191: Include Azure Blob storage credentials in CI workflow.

The WORKER_ENV_FILE in .github/workflows/test-graphql.yml must define the BLOB_STORAGE_ACCOUNT_NAME and BLOB_STORAGE_ACCESS_KEY variables required by StorageSharedKeyCredential in packages/graphql/src/services/elements.ts. For local Azurite tests, add for example:

BLOB_STORAGE_ACCOUNT_NAME=devstoreaccount1
BLOB_STORAGE_ACCESS_KEY=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 229c7e3 and 0a45f97.

📒 Files selected for processing (2)
  • .github/workflows/test-graphql.yml (3 hunks)
  • packages/graphql/test/run-tests-local.sh (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/graphql/test/run-tests-local.sh
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: check
  • GitHub Check: build
  • GitHub Check: Analyze (javascript)
  • GitHub Check: cypress-run-cloud
  • GitHub Check: build
  • GitHub Check: test
  • GitHub Check: test

services:
postgres:
image: postgres:15
image: postgres:14.18
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Verify Postgres version downgrade doesn't break existing functionality.

The Postgres service image has been downgraded from postgres:15 to postgres:14.18. Ensure this version change is intentional and that all GraphQL package tests remain compatible with Postgres 14.18, particularly regarding any SQL features or syntax that might differ between versions.


🏁 Script executed:

#!/bin/bash
# Description: Search for Postgres version requirements or version-specific features in the codebase

# Check for any Postgres version specifications in package files
fd -e json -e yml -e yaml . | xargs rg -l "postgres.*version|postgresql.*version" -i

# Look for any migration files or SQL that might use Postgres 15-specific features
rg -g '*.sql' -g '*.prisma' "GENERATED|STORED|GENERATED ALWAYS" -C 2

# Check if there are comments or documentation about Postgres version requirements
rg "postgres.*1[45]|postgresql.*1[45]" -i -C 2

Length of output: 2956


Ensure consistent Postgres version across all configs.
.github/workflows/test-graphql.yml and docker-compose.yml pin postgres:14.18, but packages/graphql/test/docker/docker-compose.test.yml and apps/olat-api/test/docker-compose.test.yml use postgres:14.8 (and the Hatchet workflow still uses postgres:15). Align every Postgres image to the same patch version (e.g., 14.18) to avoid environment drift and test inconsistencies.

🤖 Prompt for AI Agents
In .github/workflows/test-graphql.yml around line 28, the workflow pins image:
postgres:14.18 while other configs
(packages/graphql/test/docker/docker-compose.test.yml and
apps/olat-api/test/docker-compose.test.yml) pin postgres:14.8 and the Hatchet
workflow uses postgres:15; update those files to use the same patch version as
this workflow (change all postgres image references to postgres:14.18) so every
CI/workspace/docker-compose/test config uses a consistent Postgres image, and
run a quick CI test to confirm no environment drift.

Comment on lines +111 to +116
azurite:
image: mcr.microsoft.com/azure-storage/azurite:latest
ports:
- 10000:10000
- 10001:10001
- 10002:10002
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add health checks and pin Azurite version.

The Azurite service is missing health checks, which could cause tests to start before the service is ready. Additionally, using the :latest tag reduces CI reproducibility.

Apply this diff to add health checks:

     azurite:
       image: mcr.microsoft.com/azure-storage/azurite:latest
+      options: >-
+        --health-cmd "curl -f http://localhost:10000"
+        --health-interval 10s
+        --health-timeout 5s
+        --health-retries 5
       ports:
         - 10000:10000
         - 10001:10001
         - 10002:10002

Additionally, consider pinning to a specific Azurite version for reproducibility:

-      image: mcr.microsoft.com/azure-storage/azurite:latest
+      image: mcr.microsoft.com/azure-storage/azurite:3.32.0
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
azurite:
image: mcr.microsoft.com/azure-storage/azurite:latest
ports:
- 10000:10000
- 10001:10001
- 10002:10002
azurite:
image: mcr.microsoft.com/azure-storage/azurite:3.32.0
options: >-
--health-cmd "curl -f http://localhost:10000"
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
- 10000:10000
- 10001:10001
- 10002:10002
🤖 Prompt for AI Agents
.github/workflows/test-graphql.yml lines 111-116: the Azurite service uses the
:latest tag and has no healthcheck, which can make CI flaky and
non-reproducible; change the image to a pinned version (e.g.,
mcr.microsoft.com/azure-storage/azurite:3.29.0) and add a healthcheck block that
polls the blob endpoint (http://localhost:10000/devstoreaccount1/) with an HTTP
or curl command, setting reasonable interval (e.g., 10s), timeout (e.g., 5s),
and retries (e.g., 5) so tests wait until Azurite is ready; also update any
depends_on or service wait logic to use the service health status
(service_healthy) so test steps don’t start before Azurite is up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature size:XXL This PR changes 1000+ lines, ignoring generated files.

Development

Successfully merging this pull request may close these issues.

3 participants