Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughAdds a new Audit service (apps/audit) with storage, schema, metrics, auth, client, tests and CI/Helm wiring; integrates Azurite into local/CI tooling and readiness checks; plumbs AuditClient through backend, workers, GraphQL, frontend, tests and environment/configs; pins Postgres images to 14.x. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant PWA as Frontend PWA
participant GQL as GraphQL/API
participant WORKER as Worker
participant AUD as Audit Service
participant AZ as Azurite/AzureTable
User->>PWA: submit action
PWA->>AUD: POST /audit/public (cookie JWT)
AUD->>AUD: auth -> rate-limit -> validate payload
AUD->>AZ: ensureTable / upsertEntity (retry/backoff)
AZ-->>AUD: ack / error
AUD-->>PWA: {status: stored, eventId}
GQL->>AUD: POST /audit (X-Internal-Token)
WORKER->>AUD: POST /audit (worker events)
AUD->>AZ: upsertEntity
AZ-->>AUD: ack
sequenceDiagram
autonumber
participant CI as CI Job
participant WFS as wait-for-services.sh
participant AZ as Azurite
participant AUDsrv as Audit test server
CI->>AZ: start azurite container
CI->>WFS: run wait-for-services.sh (includes 7080/ready)
WFS->>AZ: check_azurite(host, port)
AZ-->>WFS: ready
CI->>AUDsrv: start audit service
AUDsrv->>AZ: ensureTable / checkConnection
CI-->>CI: run tests
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120+ minutes Possibly related PRs
Suggested reviewers
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🧪 Early access (Sonnet 4.5): enabledWe are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience. Note:
Comment |
klicker-uzh
|
||||||||||||||||||||||||||||
| Project |
klicker-uzh
|
| Branch Review |
audit-log
|
| Run status |
|
| Run duration | 11m 39s |
| Commit |
|
| Committer | Roland Schläfli |
| View all properties for this run ↗︎ | |
| Test results | |
|---|---|
|
|
1
|
|
|
0
|
|
|
0
|
|
|
0
|
|
|
724
|
| View all changes introduced in this branch ↗︎ | |
Tests for review
cypress/e2e/N-course-workflow.cy.ts • 1 failed test
…into audit-log
…into audit-log
…into audit-log # Conflicts: # packages/graphql/src/services/accounts.ts # pnpm-lock.yaml
…into audit-log # Conflicts: # _run_app_dependencies.sh # _run_app_dependencies_macos.sh # _run_app_dependencies_wsl.sh # apps/backend-docker/.env.example # pnpm-lock.yaml # project/plans_archive/PLAN-audit-log-clients.md # project/plans_archive/PLAN-audit-log.md # project/plans_archive/PROJECT-audit-log.md
|
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| 1509424 | Triggered | Generic Password | 582dee7 | .github/workflows/test-graphql.yml | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
…into audit-log # Conflicts: # pnpm-lock.yaml
…into audit-log # Conflicts: # .github/workflows/cypress-testing.yml # apps/backend-docker/.env.cypress # apps/backend-docker/.env.example # packages/util/src/index.ts # pnpm-lock.yaml
…into audit-log
…-log # Conflicts: # .versionrc.js # _run_app_dependencies.sh # pnpm-lock.yaml
| SERVER_LOGGER_LEVEL: debug | ||
| SERVER_LOGGER_FORMAT: console | ||
|
|
||
| azurite: |
There was a problem hiding this comment.
Azurite is a local emulator for blob/table/queue storage. We could probably start it only in assessment mode though (maybe another flag for the run app dependencies?)
| azureStorage: | ||
| tableName: 'auditlogs-prod' | ||
|
|
||
| corsOrigins: 'https://assessment.klicker.uzh.ch' |
There was a problem hiding this comment.
do we ever need any frontend audit logging from manage?
| }) | ||
| } catch (error) { | ||
| console.warn( | ||
| `[AUTH ${requestId}] Failed to log PARTICIPANT_LOGIN_SUCCESS:`, |
Check failure
Code scanning / CodeQL
Use of externally-controlled format string High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 7 months ago
To fix the problem, we should avoid directly embedding untrusted data into the format string passed as the first argument to logging methods such as console.warn. Instead, use a static format string and pass untrusted values as arguments for interpolation using %s. For line 173, this means changing the template literal so that requestId is not embedded in the format string; instead, use %s to indicate its place, e.g., "[AUTH %s] Failed to log PARTICIPANT_LOGIN_SUCCESS:", and then pass requestId as a subsequent argument. This change should be applied only to line 173; no imports or other regions need changing.
| @@ -170,7 +170,8 @@ | ||
| }) | ||
| } catch (error) { | ||
| console.warn( | ||
| `[AUTH ${requestId}] Failed to log PARTICIPANT_LOGIN_SUCCESS:`, | ||
| '[AUTH %s] Failed to log PARTICIPANT_LOGIN_SUCCESS:', | ||
| requestId, | ||
| error instanceof Error ? error.message : error | ||
| ) | ||
| } |
a45d073 to
5ed2c12
Compare
…-log # Conflicts: # .gitignore # apps/auth/package.json # pnpm-lock.yaml
| services: | ||
| postgres: | ||
| image: postgres:15 | ||
| image: postgres:14.18 |
There was a problem hiding this comment.
made this actually aligned with what we have on azure
| @@ -0,0 +1,47 @@ | |||
| name: Build Docker image for audit (Production) | |||
There was a problem hiding this comment.
@jabbadizzleCode will require new pipelines on the your branch as well
| role: string | ||
| } | ||
|
|
||
| export async function authMiddleware( |
There was a problem hiding this comment.
backend needs to send a fixed token in the header. frontend sends the student login JWT. could probably merge to both being a JWT with different issuers and according validation, but for now this will be enough
| const startTime = Date.now() | ||
| const event = c.req.valid('json') | ||
|
|
||
| metrics.requestsTotal.inc() |
There was a problem hiding this comment.
idea: expose certain metrics for "prometheus", which will run on the new infra and can monitor apps. it could scale up the replicas if e.g. we have a lot of total requests, or many more requests than writes, etc.
if this works well, should expand on this in our own backend too, so we dont have only CPU/GPU usage metrics
| tableClient: AuditTableClient | ||
| } | ||
|
|
||
| export function registerAuditPrivateRoutes( |
There was a problem hiding this comment.
the actual endpoints are very simple on purpose. we dont reject events as long as they are properly authenticated (not a lot of validation on e.g. allowed types beside the public endpoint). to make sure that w log something invalid rather than dropping it with an error message
| } | ||
|
|
||
| // All retries failed | ||
| setError(lastError) |
There was a problem hiding this comment.
we would need to decide what we do if even though it retried it still returns an error. probably let the user know with a toast? maybe its also just their internet that is down and the main requests fail anyways
| httpsAgent: | ||
| process.env.NODE_ENV === 'production' | ||
| ? undefined | ||
| : new https.Agent({ rejectUnauthorized: false }), |
There was a problem hiding this comment.
this needs to be done for dev mode as otherwise the https validation with our self made certificate would fail in certain cases
| @@ -0,0 +1,394 @@ | |||
| #!/bin/bash | |||
There was a problem hiding this comment.
copied and extended this script from the cypress workflows -> it now waits as part of run app dependencies until the db etc. are up. could also migrate cypress workflows etc. to use this script from util/
| // filename: `deploy/charts/klicker-uzh-v2/Chart.yaml`, | ||
| // updater: 'util/yaml-updater.js', | ||
| // }, | ||
| ...packageJsonFiles.map((filename) => ({ filename, type: 'json' })), |
There was a problem hiding this comment.
much easier to parse the folders rather than having to hardcode all services -> so many were forgotten all the time and then not updated in package.json :D
| echo "Skipping pnpm run build" | ||
| fi | ||
|
|
||
| SERVICE_ENDPOINTS="" ./util/wait-for-services.sh || { |
There was a problem hiding this comment.
empty service endpoints means dependencies only
| @@ -0,0 +1,284 @@ | |||
| import { describe, expect, it } from 'vitest' | |||
There was a problem hiding this comment.
tests all work but we should have a real use case (right now we test with random data that it arrives, is processed, etc.) -> it might be interesting if we can run the assessment mode in cypress and validate the audit log output to be correct after the cypress run (maybe dump it to a file there). otherwise, the tests here are not so useful beside validating that data can be written and the api responds correctly (also why it is irrelevant what kind of data we store)
There was a problem hiding this comment.
Actionable comments posted: 71
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (10)
apps/response-api/src/index.ts (1)
345-350: Use SHA‑256 instead of MD5 for tracking id derivationEven for non-crypto “tracking”, prefer SHA‑256. No behavior change beyond id format; downstream uses a hex string already.
- const combinedCorrelationKey = `${correlationKey}:${user.sub}` - const MD5 = createHash('md5') - MD5.update(combinedCorrelationKey) - const correlationId = MD5.digest('hex') + const combinedCorrelationKey = `${correlationKey}:${user.sub}` + const sha = createHash('sha256') + sha.update(combinedCorrelationKey) + const correlationId = sha.digest('hex')packages/graphql/src/services/stacks.ts (3)
771-781: Missing await inside Prisma transaction (detail creation).
createFlashcardResponseDetail(...)is not awaited. Unawaited Prisma calls inside$transactioncan run after the transaction closes, causing “Transaction already closed” errors or lost writes.- createFlashcardResponseDetail({ + await createFlashcardResponseDetail({ prisma, id, response, courseId, answerTime, existingInstance, participantId: ctx.user.sub, })
2456-2461: Bug: Updating totalPointsAwarded with null resets the tally.In
upsertQuestionResponseupdate branch:totalPointsAwarded: typeof pointsAwarded === 'number' ? { increment: pointsAwarded } : nullPassing
nullhere sets the DB field to null instead of leaving it unchanged. Useundefinedto skip the update.- totalPointsAwarded: - typeof pointsAwarded === 'number' ? { increment: pointsAwarded } : null, + totalPointsAwarded: + typeof pointsAwarded === 'number' + ? { increment: pointsAwarded } + : undefined,Also applies to: 2456-2459
3314-3316: Guard division by zero for stackAnswerTime averaging.If
responses.length === 0,round(stackAnswerTime / responses.length)yieldsInfinity/NaN. Add a safe default.- const elementAnswerTime = round(stackAnswerTime / responses.length) + const elementAnswerTime = responses.length + ? round(stackAnswerTime / responses.length) + : 0packages/graphql/src/services/liveQuizzes.ts (1)
1118-1129: Use a composite key for scheduledJobs to avoid collisions across quizzes.Indexing scheduled jobs by blockId alone can collide when two quizzes have the same blockId, leading to incorrect cancellations.
- if (updatedQuiz.activeBlock?.expiresAt) { - scheduledJobs[blockId] = schedule.scheduleJob( + if (updatedQuiz.activeBlock?.expiresAt) { + const jobKey = `${quizId}:${blockId}` + scheduledJobs[jobKey] = schedule.scheduleJob( dayjs(updatedQuiz.activeBlock.expiresAt).add(10, 'second').toDate(), async () => { await deactivateLiveQuizBlock({ quizId, blockId }, ctx, true) ctx.emitter.emit('invalidate', { typename: 'LiveQuiz', id: updatedQuiz.id, }) } ) } @@ - if (!isScheduled && scheduledJobs[blockId]) { - await scheduledJobs[blockId].cancel() - delete scheduledJobs[blockId] + const jobKey = `${quizId}:${blockId}` + if (!isScheduled && scheduledJobs[jobKey]) { + await scheduledJobs[jobKey].cancel() + delete scheduledJobs[jobKey] }Also applies to: 1375-1378
apps/auth/src/middleware.ts (1)
222-228: Sensitive data may be logged (referer, full callback URL incl. OAuth params)Multiple debug logs include referer and full callback URLs. In OAuth callback handlers this can leak
code/stateinto logs.console.debugprints in Node likeconsole.log, so this affects production unless gated.Apply targeted redactions:
- console.debug('Stateless context detection result:', { - isParticipantContext, - participantParam, - referer, - pathname, - }) + console.debug('Stateless context detection result:', { + isParticipantContext, + participantParam, + refererHost: getHostFromHeaderUrl(referer), + pathname, + }) @@ - console.debug( - 'Callback: injected params from cookie and cleared it', - { - url: url.toString(), - cookieSaysParticipant, - cookieSaysLecturer, - } - ) + console.debug('Callback: injected params from cookie and cleared it', { + pathname: url.pathname, + cookieSaysParticipant, + cookieSaysLecturer, + }) @@ - console.debug('PWA origin detected. Redirecting to PWA login.', { - referer, - refererHost, - redirectToParam, - redirectToHost, - pwaLoginUrl, - }) + console.debug('PWA origin detected. Redirecting to PWA login.', { + refererHost, + redirectToHost, + pwaLoginUrl, + }) @@ - console.debug('Set redirect cookie on signin:', { - cb, - isParticipantContext, - }) + console.debug('Set redirect cookie on signin:', { isParticipantContext })Additionally, consider gating all debug logs:
+const DEBUG = process.env.AUTH_DEBUG === '1' || process.env.NODE_ENV !== 'production' +const dbg = (...args: any[]) => { if (DEBUG) console.debug(...args) }Then replace
console.debugcalls withdbg. This prevents sensitive data from being printed in production by default.Also applies to: 269-279, 128-135, 325-328
packages/graphql/test/template.test.ts (1)
12-17: Disable AuditClient in test initialization
initializePrisma(packages/graphql/test/helpers.ts) callsnew AuditClient()with no config, so it throws ifAUDIT_SERVICE_URLorAUDIT_TOKENaren’t set. Pass a test config—e.g.new AuditClient({ enabled: false })—or supply dummyserviceUrl/auditTokento avoid constructor errors. Apply this change in all your test helpers..github/workflows/test-graphql.yml (1)
137-159: CI will fail without AUDIT_ or an opt-out — disable audit for GraphQL tests.*AuditClient defaults to enabled and throws if AUDIT_SERVICE_URL/TOKEN are unset. Add
AUDIT_ENABLED: falsefor this job to decouple tests.Apply within the “Run tests” env:
NODE_ENV: test HATCHET_API_URL: http://localhost:8888 HATCHET_TENANT_ID: 707d0855-80ab-4e1f-a156-f1c4546cbf52 HATCHET_HOST_PORT: 7077 + AUDIT_ENABLED: false # Provide all app origins for consistencypackages/graphql/test/elementPermissions.test.ts (1)
1919-1925: Bug: assertion references the wrong variable after state change.You re-query into
missingPermissionUserOnebut assert on the oldderivedPermissionUserOnevariable.Apply:
- expect(missingPermissionUserOne).toBeTruthy() - expect(derivedPermissionUserOne!.permissionLevel).toBe( + expect(missingPermissionUserOne).toBeTruthy() + expect(missingPermissionUserOne!.permissionLevel).toBe( PermissionLevel.OWNER )packages/graphql/src/scripts/2025-09-14_hatchet_tasks_activity_publication.ts (1)
41-45: Bug: Redis port fallback yields NaN when REDIS_PORT is unset.Number(process.env.REDIS_PORT) ?? 6379 will not fall back for NaN. Use nullish-coalescing before Number().
Apply this fix:
- port: Number(process.env.REDIS_PORT) ?? 6379, + port: Number(process.env.REDIS_PORT ?? 6379),
| "@vitest/ui": "~2.1.8", | ||
| "cross-env": "~7.0.3", |
There was a problem hiding this comment.
🧩 Analysis chain
Potential vitest/@vitest/ui major mismatch
You have vitest ~3.2.4 with @vitest/ui ~2.1.8; these should typically share the same major. Please verify compatibility and align versions if needed (e.g., both ^3.x).
Also applies to: 34-34
🌐 Web query:
Is @vitest/ui v2 compatible with vitest v3, or should their major versions match?
💡 Result:
Short answer: They should match — use @vitest/ui v3 with Vitest v3. Mismatched major versions (e.g., @vitest/ui v2 with vitest v3) have caused incompatibilities and the project explicitly publishes @vitest/* packages for the new major; the CLI also attempts to install matching add‑ons but has had issues installing the latest instead of the CLI's version. [1][2][3]
Sources:
- Vitest 3 announcement / docs. [1]
- @vitest/ui npm showing v3 releases. [2]
- GitHub issue describing problems when add‑on major versions don’t match the Vitest CLI. [3]
Align @vitest/ui major version with Vitest
Upgrade @vitest/ui to a 3.x release to match Vitest v3 and avoid known incompatibilities (apps/audit/package.json lines 26–27, 34).
🤖 Prompt for AI Agents
In apps/audit/package.json around lines 26 to 27 (and also check line 34),
@vitest/ui is pinned to a 2.x release which is incompatible with Vitest v3;
update the dependency to a 3.x release (e.g., ^3.0.0 or ~3.x) in package.json,
update any other references on line 34 that mention @vitest/ui, run the package
manager install to update lockfiles, and run the test suite to verify
compatibility with Vitest v3.
| "start": "node -r dotenv/config dist/index.js", | ||
| "start:test": "node -r dotenv/config dist/index.js dotenv_config_path=.env.cypress", | ||
| "test": "vitest run", |
There was a problem hiding this comment.
dotenv is used at runtime but listed as a devDependency; start:test also misconfigures dotenv path
- start uses node -r dotenv/config, which requires dotenv at runtime. With production installs, devDependencies aren’t present, causing startup failures.
- start:test passes dotenv_config_path as an argument instead of setting DOTENV_CONFIG_PATH, so dotenv won’t load .env.cypress.
"dependencies": {
"@azure/data-tables": "13.2.2",
"@hono/node-server": "1.19.0",
"@hono/zod-validator": "0.4.1",
"@klicker-uzh/util": "workspace:*",
"hono": "4.9.6",
"jose": "5.9.4",
+ "dotenv": "~16.4.7",
"pino": "9.9.5",
"prom-client": "15.1.3",
"ulidx": "2.3.0",
"zod": "3.25.76"
},
"devDependencies": {
"@rollup/plugin-node-resolve": "~15.3.1",
"@rollup/plugin-typescript": "~12.1.4",
"@types/node": "^20.19.11",
- "@vitest/ui": "~2.1.8",
- "cross-env": "~7.0.3",
- "dotenv": "~16.4.7",
+ "@vitest/ui": "~2.1.8",
+ "cross-env": "~7.0.3",
"nodemon": "~3.1.10",
"npm-run-all": "~4.1.5",
"rollup": "~4.34.9",
"tsx": "~4.19.4",
"typescript": "~5.6.3",
"vitest": "~3.2.4"
},
"scripts": {
"build": "cross-env NODE_ENV=production rollup -c",
"build:test": "pnpm run build",
"check": "tsc --noEmit",
"dev:assessment": "npm-run-all --parallel dev:build dev:run",
"dev:build": "rollup -c --watch",
"dev:doppler": "CONFIG=dev ../../../../util/_run_with_doppler.sh pnpm run dev",
"dev:run": "nodemon -w dist/ --exec 'node --env-file .env ./dist/index.js'",
"inspect:table": "tsx scripts/inspect-table.ts",
"start": "node -r dotenv/config dist/index.js",
- "start:test": "node -r dotenv/config dist/index.js dotenv_config_path=.env.cypress",
+ "start:test": "cross-env DOTENV_CONFIG_PATH=.env.cypress node -r dotenv/config dist/index.js",Also applies to: 22-35
🤖 Prompt for AI Agents
In apps/audit/package.json around lines 45-47 (and similarly lines 22-35), the
scripts use node -r dotenv/config at runtime but dotenv is declared only as a
devDependency and start:test passes dotenv_config_path as a CLI argument rather
than setting the DOTENV_CONFIG_PATH environment variable; move dotenv from
devDependencies to dependencies so it’s available in production installs, and
update the start:test script to set DOTENV_CONFIG_PATH=.env.cypress (e.g.
DOTENV_CONFIG_PATH=.env.cypress node -r dotenv/config dist/index.js) so dotenv
loads the correct file at startup.
| "engines": { | ||
| "node": "=20" | ||
| }, |
There was a problem hiding this comment.
Engine pin is too strict — allow Node 20 minors; require 20.6+ if using --env-file
"=20" effectively means 20.0.0 and will flag most installations. Recommend a range.
- "engines": {
- "node": "=20"
- },
+ "engines": {
+ "node": ">=20.6 <21"
+ },📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "engines": { | |
| "node": "=20" | |
| }, | |
| "engines": { | |
| "node": ">=20.6 <21" | |
| }, |
🤖 Prompt for AI Agents
In apps/audit/package.json around lines 54 to 56, the engines.node value "=20"
pins to 20.0.0 and is too strict; update it to allow Node 20 minor versions —
use ">=20 <21" to accept all 20.x releases, or if your code relies on the
--env-file behavior require Node 20.6+ and use ">=20.6.0 <21" instead; replace
the current "=20" with the appropriate range.
| rootDir: 'src', | ||
| }), | ||
| ], | ||
| external: [/@klicker-uzh*/, /node_modules/], |
There was a problem hiding this comment.
Fix external regex; current pattern doesn’t match scoped packages
/@klicker-uzh*/ applies * only to h. It won’t exclude @klicker-uzh/* as intended, causing accidental bundling.
- external: [/@klicker-uzh*/, /node_modules/],
+ // Externalize workspace pkgs and node builtins
+ external: [/^@klicker-uzh\//, /node_modules/, /^node:/],Optionally also externalize builtins explicitly:
+import { builtinModules } from 'node:module'
...
- external: [/^@klicker-uzh\//, /node_modules/, /^node:/],
+ external: [...builtinModules, ...builtinModules.map((m) => `node:${m}`), /^@klicker-uzh\//, /node_modules/, /^node:/],📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| external: [/@klicker-uzh*/, /node_modules/], | |
| // Externalize workspace pkgs and node builtins | |
| external: [/^@klicker-uzh\//, /node_modules/, /^node:/], |
🤖 Prompt for AI Agents
In apps/audit/rollup.config.js around line 22, the regex /@klicker-uzh*/ only
repeats the final "h" and doesn't match scoped packages like @klicker-uzh/*;
replace it with a regex that matches the scope plus any subpath (for example use
a pattern anchored for the scope such as ^@klicker-uzh\/.*$ or
/^@klicker-uzh\/.*/), and optionally add explicit Node builtins to the external
array (or import/derive require('module').builtinModules) so core modules aren't
accidentally bundled.
| azureStorage: | ||
| tableName: 'auditlogs-prod' | ||
|
|
There was a problem hiding this comment.
Azure Table name is invalid (hyphens not allowed).
Azure Table Storage table names must be 3–63 chars, start with a letter, and contain only alphanumeric characters. auditlogs-prod will fail. Use an alphanumeric name, e.g. AuditLogsProd or auditlogsprod.
Apply one of these diffs:
- azureStorage:
- tableName: 'auditlogs-prod'
+ azureStorage:
+ tableName: 'AuditLogsProd'or
- azureStorage:
- tableName: 'auditlogs-prod'
+ azureStorage:
+ tableName: 'auditlogsprod'📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| azureStorage: | |
| tableName: 'auditlogs-prod' | |
| azureStorage: | |
| tableName: 'AuditLogsProd' |
🤖 Prompt for AI Agents
In deploy/env-prod-v3/values.yaml around lines 554 to 556 the Azure Table name
is invalid because it contains a hyphen; Azure Table names must be 3–63
characters, start with a letter, and contain only alphanumeric characters.
Replace 'auditlogs-prod' with a valid alphanumeric name (e.g., AuditLogsProd or
auditlogsprod) and update any references that consume this value to use the new
tableName so deployments and templates remain consistent.
| azureStorage: | ||
| tableName: 'auditlogs-qa' | ||
|
|
There was a problem hiding this comment.
Azure Table name is invalid (hyphens not allowed).
auditlogs-qa violates Azure Table naming rules (alphanumeric only). Use AuditLogsQa or auditlogsqa.
- azureStorage:
- tableName: 'auditlogs-qa'
+ azureStorage:
+ tableName: 'AuditLogsQa'📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| azureStorage: | |
| tableName: 'auditlogs-qa' | |
| azureStorage: | |
| tableName: 'AuditLogsQa' |
🤖 Prompt for AI Agents
In deploy/env-qa-v3/values.yaml around lines 315 to 317 the
azureStorage.tableName value 'auditlogs-qa' contains a hyphen which violates
Azure Table naming rules; replace it with a valid name using only alphanumeric
characters and start with a letter (e.g., AuditLogsQa or auditlogsqa), update
any dependent references (templates, scripts, or IAM/policy entries) to use the
new table name, and run a quick lint/helm template to confirm no other
occurrences remain.
| @@ -1302,6 +1314,17 @@ export async function activateLiveQuizBlock( | |||
| }) | |||
|
|
|||
| redisMulti.exec() | |||
There was a problem hiding this comment.
Await the Redis pipeline to avoid lost writes/races.
redisMulti.exec() is not awaited; writes may be reordered or dropped under load, especially before emit/publish and audit log steps.
- redisMulti.exec()
+ await redisMulti.exec()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| redisMulti.exec() | |
| // …previous pipeline commands… | |
| await redisMulti.exec() | |
| // …emit/publish and audit log steps… |
🤖 Prompt for AI Agents
In packages/graphql/src/services/liveQuizzes.ts around line 1316, the call to
redisMulti.exec() is not awaited which can cause lost or reordered writes under
load; change to await the exec() call (or promisify/callback to Promise if exec
uses a callback) and handle rejection/result (log/error/throw) before proceeding
to subsequent emit/publish and audit-log steps so all Redis writes complete
successfully and failures are surfaced.
| export interface AuditTableEntity { | ||
| partitionKey: string | ||
| rowKey: string | ||
| scope: string |
There was a problem hiding this comment.
@rschlaefli If we know that the value will be one of three strings, is there a reason, why we do not type this accordingly?
| action: event.action, | ||
| timestamp: event.timestamp, | ||
| resource: event.resource || null, | ||
| // Include subset of attributes for uniqueness (avoid full object for size) |
There was a problem hiding this comment.
@rschlaefli Do I understand this correctly that we do not take a subset of the attributes (contrary to what the comment states), but actually just cut off part of the hash we generate from the attributes? Does this make sense? This seems like we go through the potentially more expensive computation to compute the hash for the large object, throw away half of what is already a compact representation, but do not care about the other fields above, which have up to 200-400 characters according to the zod validation?
| action: AuditAction.PARTICIPANT_SUBMIT_RESPONSE, | ||
| resource: `instance:${instanceId}`, | ||
| scope: AuditScope.PUBLIC, | ||
| correlationId: correlationKey ?? undefined, |
There was a problem hiding this comment.
Maybe this is important: This correlationKey is not the same as the correlationId we use later on from within the response api through the assessment processor to track a response - the latter also contains the participant id, while this correlation key only contains the quiz id, instance id and execution ix in a format that can be validated
| 'create-audit-log-entry', | ||
| { | ||
| subject: `participant:unknown`, | ||
| correlationId: correlationKey, |
There was a problem hiding this comment.
Same note as for the frontend use - the correlationKey is not the same as the correlationId we combine from the correlation key and the participant id further down (the same holds for all the following audit log entries?)
| { | ||
| scope: AuditScope.INTERNAL, | ||
| subject: `participant:${user.sub}`, | ||
| correlationId: correlationKey, |
There was a problem hiding this comment.
(same as above) - from here onwards, the correlationId is available and should be used
There was a problem hiding this comment.
Actionable comments posted: 16
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/auth/src/lib/helpers.ts (1)
206-212: PII exposure: email arrays logged at debug level.The debug log includes
emailCollection.profileEmailsandemailCollection.affiliationEmails, both of which likely contain user email addresses (PII). This poses the same privacy risk as the previous comment.Apply this diff to log only counts:
- console.debug( - `Invitation mode ${invitationEmailMode} provided no eligible emails.`, - { - profileEmails: emailCollection.profileEmails, - affiliationEmails: emailCollection.affiliationEmails, - } - ) + console.debug( + `Invitation mode ${invitationEmailMode} provided no eligible emails. Profile: ${emailCollection.profileEmails.length}, Affiliation: ${emailCollection.affiliationEmails.length}` + )
🧹 Nitpick comments (47)
apps/response-api/.env.example (1)
9-11: LGTM! Clean integration of audit configuration.The three audit environment variables are well-structured and appropriately placed. The
AUDIT_TOKENexample value clearly signals this is a test token, which is correct for an.env.examplefile.Optional enhancement: Consider adding inline comments to help developers understand each variable's purpose, especially around token security requirements for production:
+# Enable audit logging for assessment activities AUDIT_ENABLED=true +# URL of the audit service endpoint AUDIT_SERVICE_URL=http://127.0.0.1:7080 +# Authentication token for audit service (generate a secure token for production) AUDIT_TOKEN=test-secret-token-123packages/graphql/src/services/stacks.ts (1)
2620-2640: Audit logging scaffolding references undefined variables.The commented audit logging code references several undefined variables (
auditClient,sessionId,participantId,startTime) that are not declared in the current scope. While the scaffolding approach is reasonable for future implementation, these undefined references could cause runtime errors if the code is accidentally uncommented during development.Consider either:
- Adding TODO comments with the required variable declarations, or
- Wrapping the scaffolding in a conditional check for a feature flag to make it more explicit that this code is incomplete
Example guard pattern:
+ // TODO: Initialize audit logging dependencies when feature is enabled + // const auditClient = ctx.auditClient + // const sessionId = ctx.sessionId + // const participantId = ctx.user?.sub + // const startTime = Date.now() + // // Log response submission attempt - // if (participantId && ctx.user?.role === DB.UserRole.PARTICIPANT) { + // if (ctx.featureFlags?.auditLogging && participantId && ctx.user?.role === DB.UserRole.PARTICIPANT) { // await auditClient.log({Also applies to: 2656-2673, 2812-2828, 2931-2948, 2952-2971
apps/hatchet-worker-response-processor/.env.example (2)
10-12: LGTM! Consider adding inline comments for clarity.The audit configuration variables are correctly added and use appropriate example values for local development. The flag, service URL, and token pattern align with the PR's objective to add audit logging.
Consider adding inline comments to document these variables:
+# Audit logging AUDIT_ENABLED=true AUDIT_SERVICE_URL=http://127.0.0.1:7080 AUDIT_TOKEN=test-secret-token-123
12-12: Ensure production audit tokens use secure secret management.The example token is appropriately marked as a placeholder. In production deployments, ensure audit tokens are managed through secure secret management systems (e.g., Kubernetes secrets, AWS Secrets Manager, environment-specific secure vaults) and never committed to version control.
apps/auth/src/middleware.ts (1)
61-66: Consider removing commented-out code.Commented-out debug logging should either be removed entirely or controlled via an environment variable/feature flag if it's needed for occasional debugging. Leaving commented code in the codebase can lead to confusion about its intended use.
If you want to keep this logging available for debugging, consider:
- // console.debug('MIDDLEWARE RUNNING:', { - // pathname, - // fullUrl: request.url, - // referer: request.headers.get('referer'), - // searchParams: Object.fromEntries(request.nextUrl.searchParams.entries()), - // }) + if (process.env.DEBUG_AUTH_MIDDLEWARE === 'true') { + console.debug('MIDDLEWARE RUNNING:', { + pathname, + fullUrl: request.url, + referer: request.headers.get('referer'), + searchParams: Object.fromEntries(request.nextUrl.searchParams.entries()), + }) + }Otherwise, simply remove the commented lines.
apps/frontend-pwa/.env.assessment.qa (1)
31-32: LGTM: Audit configuration added for QA assessment environment.Both
NEXT_PUBLIC_AUDIT_SERVICE_URLandNEXT_PUBLIC_AUDIT_ENABLEDhave been correctly added for the QA assessment environment, consistent with the PR's audit logging objectives.Optional: The static analysis tool suggests reordering keys alphabetically (
NEXT_PUBLIC_AUDIT_ENABLEDbeforeNEXT_PUBLIC_AUDIT_SERVICE_URL), but this has no functional impact and is purely a style preference._down.sh (1)
4-4: Review the-vflag for potential data loss.The
-vflag removes all volumes for the listed services, which will delete persistent data (PostgreSQL databases, Redis data, Azurite storage). This may be unexpected for developers running the down script during normal development workflows.Consider one of:
- Remove
-vto preserve data by default:docker compose down postgres redis_exec ...- Make volume removal explicit via a parameter:
docker compose down ${REMOVE_VOLUMES:+"-v"} postgres ...- Rename script to
_down_clean.shand create a separate_down.shwithout-vThis prevents accidental data loss while providing an explicit cleanup option when needed.
apps/backend-docker/.env.cypress (1)
17-20: Consider reordering keys per linter convention.The audit configuration is correct and appropriate for Cypress testing. However, the dotenv-linter suggests placing
AUDIT_ENABLEDbeforeAUDIT_SERVICE_URLfor consistency.Apply this diff if you want to align with the linter:
# Audit Service Configuration -AUDIT_SERVICE_URL=http://127.0.0.1:7080 AUDIT_ENABLED=true +AUDIT_SERVICE_URL=http://127.0.0.1:7080 AUDIT_TOKEN=test-secret-token-123apps/backend-docker/.env.example (1)
18-21: Consider reordering keys in the example file.The audit configuration is correct. Since this is an example file that serves as documentation, consider placing
AUDIT_ENABLEDbeforeAUDIT_SERVICE_URLto align with dotenv-linter conventions and provide a consistent example.Apply this diff:
# Audit Service Configuration -AUDIT_SERVICE_URL=http://127.0.0.1:7080 AUDIT_ENABLED=true +AUDIT_SERVICE_URL=http://127.0.0.1:7080 AUDIT_TOKEN=test-secret-token-123apps/frontend-pwa/.env.assessment (1)
35-36: Consider reordering for consistency (optional).The audit configuration is functionally correct. The static analysis tool suggests placing
NEXT_PUBLIC_AUDIT_ENABLEDbeforeNEXT_PUBLIC_AUDIT_SERVICE_URLfor alphabetical ordering, but this is a minor style preference.If you want to address the linter hint:
-NEXT_PUBLIC_AUDIT_SERVICE_URL=https://audit.klicker.uzh.ch NEXT_PUBLIC_AUDIT_ENABLED=true +NEXT_PUBLIC_AUDIT_SERVICE_URL=https://audit.klicker.uzh.chpackages/graphql/test/docker/docker-compose.test.yml (1)
96-103: Consider adding a volume for Azurite data persistence.The Azurite service is correctly configured with all three storage service ports. However, without a named volume, audit data will be lost when the container is removed, which may complicate debugging test failures.
Apply this diff to add optional persistence:
azurite: image: mcr.microsoft.com/azure-storage/azurite:latest ports: - 10000:10000 # Blob service - 10001:10001 # Queue service - 10002:10002 # Table service networks: - klicker + volumes: + - azurite_data:/dataAnd add the volume declaration:
volumes: hatchet_lite_config: + azurite_data:apps/audit/.env.cypress (1)
5-6: Optional: Reorder keys per linter preference.The dotenv-linter suggests placing
NODE_ENVbeforePORT. This is purely a stylistic preference and has no functional impact.If you wish to silence the linter warning:
# Server Configuration -PORT=7080 NODE_ENV=test +PORT=7080apps/audit/src/utils/metrics.ts (1)
40-50: Consider logging errors in the metrics endpoint.The catch block returns a generic error response without logging the actual error, which may complicate debugging if metrics collection fails.
Apply this diff to add error logging:
export function setupMetrics(app: Hono): void { app.get('/metrics', async (c) => { try { const register = client.register c.header('Content-Type', register.contentType) return c.text(await register.metrics()) } catch (error) { + console.error('Failed to generate metrics:', error) return c.text('Error generating metrics', 500) } }) }apps/audit/Dockerfile (2)
4-14: LGTM - deps stage follows best practices.The deps stage correctly prunes the monorepo for the audit scope using turbo. Version pinning of pnpm and turbo ensures reproducible builds.
Consider extracting pnpm and turbo versions to build args for easier maintenance across multiple Dockerfiles:
FROM docker.io/library/node:20.18.1-alpine AS base +ARG PNPM_VERSION=9.15.4 +ARG TURBO_VERSION=2.1.0 FROM base AS deps WORKDIR /app RUN apk add --no-cache libc6-compat RUN apk update -RUN npm i -g --ignore-scripts pnpm@9.15.4 turbo@2.1.0 +RUN npm i -g --ignore-scripts pnpm@${PNPM_VERSION} turbo@${TURBO_VERSION}
36-57: Consider removing pnpm/turbo from runtime stage.The runtime stage installs pnpm and turbo globally but only executes
nodedirectly. These tools are unnecessary at runtime and increase image size and attack surface.Apply this diff to remove unnecessary tooling from runtime:
FROM base AS runtime WORKDIR /app ARG NODE_ENV="production" -RUN npm i -g --ignore-scripts pnpm@9.15.4 turbo@2.1.0 +RUN npm i -g --ignore-scripts pnpm@9.15.4 COPY --from=deps /app/out/json/ . COPY --from=deps /app/out/pnpm* . RUN pnpm i --frozen-lockfile --prod --ignore-scripts +RUN npm uninstall -g pnpm RUN addgroup --system --gid 1001 nodejs RUN adduser --system --uid 1001 nodejs COPY --from=builder /app/apps/audit/dist/ /app/apps/audit/dist/ USER nodejs EXPOSE 7080 CMD ["node", "/app/apps/audit/dist/index.js"]Alternatively, if no global tools are needed post-install, use a pure Node base for runtime without any package managers.
apps/audit/src/utils/rate-limit.ts (1)
24-31: Fixed-window rate limiting semantics.The implementation uses a fixed window that resets at
resetTime. This is simpler than a true sliding window but can allow up to 2× the limit in edge cases (requests at end of one window + start of next).If strict rate limiting is required, consider a sliding window algorithm (e.g., token bucket or sliding log).
packages/graphql/test/activitySharing.test.ts (1)
312-325: Consider extracting an audit-log assertion helperThere are dozens of blocks like this that manually call
findFirst, check truthiness, and assert the formattedmessage. Any tweak to the log format will require editing every block, and it’s easy to miss places when copying the pattern. A tiny helper (e.g.expectAuditLogEntry({ type, objectType, objectId, sourceUserId, targetUserId }, expectedMessage)) would collapse the duplication, keep the tests declarative, and make future message changes safer.docker-compose.yml (1)
255-272: Consider adding a health check for the Azurite service.The Azurite service lacks a health check definition, unlike other services in this compose file (postgres, redis, hatchet). Without a health check, dependent services may attempt connections before Azurite is ready.
azurite: image: mcr.microsoft.com/azure-storage/azurite:latest + healthcheck: + test: ["CMD", "nc", "-z", "localhost", "10000"] + interval: 10s + timeout: 5s + retries: 5 ports: - '10000:10000' # Blob service - '10001:10001' # Queue service - '10002:10002' # Table service command: > azurite --tableHost 0.0.0.0 --blobHost 0.0.0.0 --queueHost 0.0.0.0 --location /data --debug /data/debug.log volumes: - azurite_data:/data environment: - AZURITE_ACCOUNTS=devstoreaccount1.github/workflows/test-graphql.yml (1)
101-106: Consider adding a health check for the Azurite service in CI.The Azurite service lacks a health check configuration, unlike other services (postgres, redis, hatchet) which have explicit health checks. Without a health check, tests may start before Azurite is ready, leading to flaky test failures.
azurite: image: mcr.microsoft.com/azure-storage/azurite:latest + options: >- + --health-cmd "wget --no-verbose --tries=1 --spider http://localhost:10000 || exit 1" + --health-interval 10s + --health-timeout 5s + --health-retries 5 ports: - 10000:10000 - 10001:10001 - 10002:10002Note: The AI summary mentions that
wait-for-services.shchecks Azurite readiness at port 7080, but the service exposes standard Azure Storage ports (10000-10002). Verify that the readiness check matches the actual Azurite configuration..github/workflows/v3_audit-qa.yml (1)
32-32: Upgrade GitHub Actions to latest versionsThe workflow uses outdated action versions with deprecated runners. Upgrade to avoid potential execution failures:
- - name: Checkout repository - uses: actions/checkout@v3 + - name: Checkout repository + uses: actions/checkout@v4 - - name: Set up QEMU - uses: docker/setup-qemu-action@v2 + - name: Set up QEMU + uses: docker/setup-qemu-action@v3 - - name: Set up Docker Buildx - uses: docker/setup-buildx-action@v2 + - name: Set up Docker Buildx + uses: docker/setup-buildx-action@v3 - - name: Log into registry ${{ env.REGISTRY }} - uses: docker/login-action@v2 + - name: Log into registry ${{ env.REGISTRY }} + uses: docker/login-action@v3 - - name: Extract Docker metadata - id: meta - uses: docker/metadata-action@v4 + - name: Extract Docker metadata + id: meta + uses: docker/metadata-action@v5Also applies to: 35-35, 38-38, 41-41, 49-49
.github/workflows/test-audit.yml (2)
115-125: Consider parallelizing package builds.The serial
cd && pnpm run buildpattern could be optimized by building independent packages in parallel.- name: Build dependency packages run: | - cd packages/prisma - pnpm run build - cd ../types - pnpm run build - cd ../grading - pnpm run build - cd ../util - pnpm run build + pnpm --filter @klicker-uzh/prisma run build + pnpm --filter @klicker-uzh/types run build + pnpm --filter @klicker-uzh/grading run build + pnpm --filter @klicker-uzh/util run buildOr use pnpm's workspace parallelism:
pnpm --filter "@klicker-uzh/{prisma,types,grading,util}" run build
126-140: Add wait-for-services health check before starting audit.The audit service starts after a fixed 10-second sleep, but dependent services (Azurite, Postgres) may not be ready. Consider using the wait-for-services.sh script referenced in the AI summary or explicit health checks.
- name: Start audit service run: | + # Wait for Azurite and Postgres to be ready + timeout 60 bash -c 'until curl -f http://127.0.0.1:10002/devstoreaccount1 2>/dev/null; do sleep 2; done' + timeout 60 bash -c 'until pg_isready -h 127.0.0.1 -p 5432 -U klicker 2>/dev/null; do sleep 2; done' cd apps/audit pnpm run build nohup pnpm start & - sleep 10 + # Wait for audit service to be ready + timeout 60 bash -c 'until curl -f http://localhost:7080/healthz 2>/dev/null; do sleep 2; done'apps/audit/src/config.ts (1)
21-21: Consider making AUDIT_CORS_ORIGINS split-aware.The config defines CORS origins as a comma-separated string with a default. Consider transforming it to an array for easier consumption.
// CORS configuration (comma-separated origins) - AUDIT_CORS_ORIGINS: z.string().default('https://assessment.klicker.com'), + AUDIT_CORS_ORIGINS: z + .string() + .default('https://assessment.klicker.com') + .transform((s) => s.split(',').map((o) => o.trim())),This would make the parsed
config.AUDIT_CORS_ORIGINSan array directly, simplifying downstream usage.apps/audit/src/utils/logger.ts (1)
1-26: Well-configured structured logger with proper redaction.The Pino logger configuration follows best practices with ISO timestamps, uppercase level formatting, and comprehensive redaction of sensitive fields. The inclusion of environment metadata aids in log aggregation across deployments.
Optional: Consider partial masking for authorization headers.
Currently,
authorizationis fully redacted. For debugging, you might consider partial masking (e.g.,Bearer ***) to preserve the authentication scheme while hiding the token. However, full redaction is more secure and acceptable for audit logs.redact: { paths: [ 'password', 'token', - 'authorization', 'connectionString', 'attributes.password', 'attributes.token', 'attributes.secret', ], censor: '[REDACTED]', + remove: false, }, + serializers: { + authorization: (value: string) => { + if (typeof value === 'string' && value.startsWith('Bearer ')) { + return 'Bearer [REDACTED]' + } + return '[REDACTED]' + }, + },deploy/charts/klicker-uzh-v2/templates/deployment-audit.yaml (1)
32-40: Consider service account security configuration.The deployment doesn't specify
serviceAccountNameorautomountServiceAccountToken. For defense in depth, either bind a dedicated service account with minimal RBAC permissions or disable token automounting if the audit service doesn't need Kubernetes API access.If no K8s API access is needed:
spec: + automountServiceAccountToken: false priorityClassName: {{ .Values.audit.priorityClassName }}Or bind a dedicated service account:
spec: + serviceAccountName: {{ include "chart.fullname" . }}-audit priorityClassName: {{ .Values.audit.priorityClassName }}apps/audit/vitest.config.ts (2)
60-68: Consider test isolation strategies for better parallelization.Single-threaded execution avoids Azurite table contention but severely limits test performance. Consider test isolation techniques (e.g., per-suite table names with random suffixes or cleanup in teardown) to enable parallel test execution while maintaining isolation.
Example approach:
// In test setup const testTableName = `auditlogs_${Date.now()}_${Math.random().toString(36).substr(2, 9)}` // Use testTableName for this test suite // Clean up in afterAll()This would allow increasing
maxThreadsfor faster test runs.
57-58: Monitor and fix flaky tests rather than relying on retries.While retry=1 is reasonable for integration tests with external dependencies, investigate and eliminate the root causes of flakiness rather than masking them with retries.
apps/frontend-pwa/src/components/liveQuiz/QuestionArea.tsx (1)
199-207: Consider non-blocking audit logging for better user experience.The audit log call in
onSubmitis currently awaited, which could delay the submission flow if the audit service is slow or unreachable. TheuseAuditClienthook provides alogAsyncmethod for fire-and-forget logging. Consider usinglogAsyncinstead to ensure the user submission flow is not blocked by audit logging:- await auditLog.log({ + auditLog.logAsync({ action: AuditAction.PARTICIPANT_SUBMIT_RESPONSE, resource: `instance:${instanceId}`, scope: AuditScope.PUBLIC, correlationId: correlationKey ?? undefined, attributes: { response: studentResponse.response, }, })This ensures audit logging failures or latency do not impact the participant's submission experience.
Based on learnings.
apps/audit/test/public-endpoint.test.ts (1)
42-145: LGTM: Comprehensive authentication test coverage.The test suite thoroughly validates the authentication flow:
- Valid participant tokens (200 response)
- Missing cookies (401)
- Wrong cookie names (401)
- Invalid/expired tokens (401)
- Malformed JWTs (401)
Each test verifies both status codes and error messages, providing good coverage for security-critical functionality.
Optional enhancement: Consider defining a response type instead of using
anycasts (e.g., lines 63, 82, 101):type AuditResponse = { status: 'stored' eventId: string } | { error: string } const result = await response.json() as AuditResponseapps/audit/src/middleware/auth.ts (1)
39-71: Consider stricter type validation for JWT payload fields.The function casts
payload.subandpayload.roleto strings without runtime type checks. If the JWT payload is malformed or tampered with (e.g.,subis a number orroleis an array), the function may behave unexpectedly.Consider adding explicit type guards:
const participantId = payload.sub as string const role = payload.role as string - if (!participantId || !role) { + if (typeof participantId !== 'string' || typeof role !== 'string' || !participantId || !role) { return null }apps/audit/test/utils/azure-table-helper.ts (1)
191-225: Complex object comparison may produce false positives.Lines 206-216 compare complex objects by stringifying and comparing JSON. This approach is fragile:
- Property order differences cause mismatches even when values are equivalent
- Serialization of
undefined,NaN, or circular references may behave unexpectedly- Nested objects with different but equivalent representations (e.g.,
{ a: 1 }vs.{ a: 1.0 }) may failConsider using a deep equality library or normalizing objects before comparison:
import { isDeepStrictEqual } from 'util' // In verifyEntity: if (typeof expectedValue === 'object' && expectedValue !== null) { const actualObj = typeof actualValue === 'string' ? JSON.parse(actualValue) : actualValue if (!isDeepStrictEqual(actualObj, expectedValue)) { throw new Error( `Entity ${key} mismatch. Expected: ${JSON.stringify(expectedValue)}, Got: ${JSON.stringify(actualObj)}` ) } }apps/audit/src/storage/entities.ts (1)
95-97: Single-character shard provides limited distribution.Line 96 uses only the first hex character of the hash for sharding, providing only 16 possible shards per time bucket. For very high-throughput scenarios (thousands of events per minute), this may create hot partitions.
Consider using 2 hex characters for 256 shards if higher distribution is needed:
const shard = eventId - ? createHash('sha256').update(eventId).digest('hex')[0]! + ? createHash('sha256').update(eventId).digest('hex').substring(0, 2) : '0'Monitor partition-level metrics in production to determine if increased sharding is beneficial.
apps/audit/test/integration.test.ts (3)
5-5: Consider making BASE_URL configurable via environment variable.The hardcoded
BASE_URLmay cause issues if the audit service runs on a different port in CI or other environments.Apply this diff to make it configurable:
-const BASE_URL = 'http://localhost:7080' +const BASE_URL = process.env.AUDIT_SERVICE_URL || 'http://localhost:7080'
46-64: Optimize polling: avoid fetching all entities on each iteration.The current implementation calls
getAllEntities()on every poll, which becomes inefficient as entity count grows. Consider using partition key queries or adding a helper method that queries by rowKey directly.If the partition key is predictable from the rowKey/eventId, refactor to use
getEntity(partitionKey, rowKey). Otherwise, consider adding a filtered query toAzureTableTestHelperto avoid full table scans.
235-235: Replace fixed sleep delays with polling for reliability.Multiple tests use fixed
setTimeoutdelays (1-3 seconds) to wait for persistence. This can cause flakiness if operations take longer than expected or unnecessary slowness if they complete faster.Replace fixed delays with the existing
waitForEntityByRowKeyhelper ortableHelper.waitForEntityCountfor more robust waiting. For example:- // Wait for persistence - await new Promise((resolve) => setTimeout(resolve, 1000)) + // Wait for all events to be persisted + await tableHelper.waitForEntityCount(2, 5000)Or wait for specific entities:
- await new Promise((resolve) => setTimeout(resolve, 2000)) + for (const event of authEvents) { + await waitForEntityByRowKey(event.eventId) + }Also applies to: 273-273, 307-307, 341-341, 384-384
packages/graphql/src/services/accounts.ts (1)
100-160: Add success audit logging for consistency.
loginParticipantlogs failure events but not success. Other login methods (magic link, LTI, temporary) log both success and failure for complete audit trails.Add success logging after line 156:
await doParticipantLogin( { participantId: participant.id, participantLocale: participant.locale, }, ctx ) + // Log successful login + await ctx.auditClient.log({ + subject: `participant:${participant.id}`, + action: AuditAction.PARTICIPANT_LOGIN_SUCCESS, + attributes: { + method: 'password', + }, + }) + // TODO: return more data (e.g. Avatar etc.) return participant.idapps/audit/test/database-verification.test.ts (4)
139-140: Use tableHelper.waitForEntity for consistency.Line 140 uses a fixed 2-second delay, while other tests use
tableHelper.waitForEntityorwaitForEntityCountwith configurable timeouts. This makes the test less reliable and slower than necessary.Replace the fixed delay:
- // Wait and verify only one record exists - await new Promise((resolve) => setTimeout(resolve, 2000)) + // Wait for entity to be persisted + await tableHelper.waitForEntity( + // We need to get the partition key first + (await tableHelper.getAllEntities())[0]?.partitionKey || '', + event.eventId, + 5000 + )Or simplify to wait for count:
- await new Promise((resolve) => setTimeout(resolve, 2000)) + await tableHelper.waitForEntityCount(1, 5000)
168-169: Consider using tableHelper wait methods consistently.Lines 168, 212, 248 use fixed
setTimeoutdelays. For consistency with other tests (e.g., line 88) and better reliability, usetableHelper.waitForEntityorwaitForEntityCount.Example for line 168:
- await new Promise((resolve) => setTimeout(resolve, 1000)) + await tableHelper.waitForEntityCount(1, 5000)
521-522: Performance assertion may be flaky in CI.The 1-second threshold for querying 10 entities from local Azurite is reasonable for local development, but may fail in resource-constrained CI environments or under load. Consider increasing the threshold or making it configurable.
- // Query should be reasonably fast (under 1 second for 10 entities) - expect(queryTime).toBeLessThan(1000) + // Query should be reasonably fast (allow more time in CI) + const maxQueryTime = process.env.CI ? 3000 : 1000 + expect(queryTime).toBeLessThan(maxQueryTime)
533-551: Incomplete test: No actual connectivity issue simulation.The test name and comments suggest database connectivity testing, but it only submits a normal event and accepts either 200 or 503. To meaningfully test connectivity handling, consider:
- Testing the service behavior when Azurite is temporarily unavailable
- Validating retry logic and error responses
- Checking that events are queued or handled gracefully during outages
If actual connectivity testing is not feasible in this test suite, consider renaming the test to reflect what it actually validates or moving connectivity tests to a separate integration suite.
packages/graphql/src/services/liveQuizzes.ts (1)
2674-2679: Clean up commented audit log code.The commented-out audit log suggests incomplete migration from Hatchet-based logging to the audit client. Either:
- Remove the commented code if Hatchet logging is the final approach
- Complete the migration to audit client with appropriate attributes (participant ID, instance ID, points deducted)
If migrating to audit client, consider:
await ctx.auditClient.log({ scope: AuditScope.INTERNAL, action: AuditAction.USER_RESET_QUIZ_DEDUCT_POINTS, subject: `user:${ctx.user.sub}`, resource: `live-quiz:${id}`, attributes: { participantId: response.participantId, instanceId: instance.id, blockId: block.id, basePoints: response.basePoints, correctnessPoints: response.correctnessPoints, bonusPoints: response.bonusPoints, }, })apps/audit/src/storage/table-client.ts (2)
77-98: Consider handling the unused result variable.Line 82 retrieves the iterator result but doesn't use it. This may trigger linting warnings.
Apply this diff to acknowledge the intentionally unused result:
const tablesList = this.client.listEntities() - const result = await tablesList.next() // Just check if we can get the iterator + void (await tablesList.next()) // Just check if we can get the iterator
187-197: Consider more specific error message matching.The broad terms
invalidandmalformedmight unintentionally match error messages that should be retried. For example, "Invalid network state" might be a transient issue.If you have access to structured error codes or properties from the Azure SDK (e.g.,
statusCode,code), prefer those over message string matching. Otherwise, consider more specific patterns:private isNonRetryableError(error: Error): boolean { const message = error.message.toLowerCase() return ( message.includes('entitytoolarge') || message.includes('requestbodytoolarge') || - message.includes('invalid') || - message.includes('malformed') || + message.includes('invalid request') || + message.includes('invalid entity') || + message.includes('malformed request') || message.includes('unauthorized') || message.includes('forbidden') ) }.github/workflows/cypress-testing.yml (1)
119-124: Consider pinning the Azurite version for reproducibility.The
latesttag can lead to non-reproducible builds if Microsoft releases a new version with breaking changes.Pin to a specific version:
azurite: - image: mcr.microsoft.com/azure-storage/azurite:latest + image: mcr.microsoft.com/azure-storage/azurite:3.35.0Apply to both job definitions (lines 120 and 339).
Also applies to: 338-343
apps/audit/test/scenarios.test.ts (3)
4-5: Consider making BASE_URL configurable.The hard-coded
BASE_URLand fallbackAUTH_TOKENwork for local testing but might need environment-specific configuration for different test environments.Make the base URL configurable:
-const BASE_URL = 'http://localhost:7080' +const BASE_URL = process.env.AUDIT_SERVICE_URL || 'http://localhost:7080' const AUTH_TOKEN = process.env.AUDIT_TOKEN || 'test-secret-token-123'
55-62: Replace hard-coded delay with polling-based verification.The 2-second delay (line 56) is fragile—it might be too short under load or unnecessarily slow when persistence is quick.
Use
waitForEntityCountfrom the test helper:- // Wait for persistence - await new Promise((resolve) => setTimeout(resolve, 2000)) - - // Verify all events are persisted - const persistedEntities = (await tableHelper.getAllEntities()).filter( - (entity) => (entity.rowKey ? eventIds.has(entity.rowKey) : false) - ) - expect(persistedEntities.length).toBe(events.length) + // Wait for persistence with polling + const initialCount = await tableHelper.getEntityCount() + await tableHelper.waitForEntityCount(initialCount + events.length, 5000) + + // Retrieve persisted entities + const persistedEntities = (await tableHelper.getAllEntities()).filter( + (entity) => (entity.rowKey ? eventIds.has(entity.rowKey) : false) + ) + expect(persistedEntities.length).toBe(events.length)
83-1013: LGTM! Comprehensive end-to-end scenario coverage.The test suite provides excellent coverage of real-world audit scenarios with proper verification of event sequences, correlations, and business logic. The nested describe blocks and console logs enhance readability and debugging.
Some individual tests are quite long (150+ lines). Consider extracting shared verification logic into helper functions for improved maintainability:
function verifyEventSequence(events: any[], expectedActions: string[]) { const actions = events.map(e => e.action) expect(actions).toEqual(expectedActions) } function verifyAllEventsShareResource(events: any[], resourceId: string) { events.forEach(event => { expect(event.resource).toBe(resourceId) }) }
|
|
||
| steps: | ||
| - name: Checkout repository | ||
| uses: actions/checkout@v3 |
There was a problem hiding this comment.
Update actions/checkout to v4.
The checkout action uses a deprecated runner.
- name: Checkout repository
- uses: actions/checkout@v3
+ uses: actions/checkout@v4📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| uses: actions/checkout@v3 | |
| uses: actions/checkout@v4 |
🧰 Tools
🪛 actionlint (1.7.7)
109-109: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🤖 Prompt for AI Agents
.github/workflows/test-audit.yml around line 109: the workflow uses
actions/checkout@v3 which is deprecated; update the step to use
actions/checkout@v4 by changing the version reference to v4 (or the latest
stable minor tag), and run a quick CI trigger to verify no breaking changes in
checkout behavior (ensure any inputs used by the step remain compatible with
v4).
| uses: actions/checkout@v3 | ||
|
|
||
| - name: Set up QEMU | ||
| uses: docker/setup-qemu-action@v2 | ||
|
|
||
| - name: Set up Docker Buildx | ||
| uses: docker/setup-buildx-action@v2 |
There was a problem hiding this comment.
Update GitHub Actions to latest versions.
Multiple actions use deprecated runners that will soon be unsupported. Update to the latest stable versions to ensure compatibility with GitHub Actions infrastructure.
Apply this diff:
- - name: Checkout repository
- uses: actions/checkout@v3
+ - name: Checkout repository
+ uses: actions/checkout@v4
- name: Set up QEMU
- uses: docker/setup-qemu-action@v2
+ uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
- uses: docker/setup-buildx-action@v2
+ uses: docker/setup-buildx-action@v3📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| uses: actions/checkout@v3 | |
| - name: Set up QEMU | |
| uses: docker/setup-qemu-action@v2 | |
| - name: Set up Docker Buildx | |
| uses: docker/setup-buildx-action@v2 | |
| uses: actions/checkout@v4 | |
| - name: Set up QEMU | |
| uses: docker/setup-qemu-action@v3 | |
| - name: Set up Docker Buildx | |
| uses: docker/setup-buildx-action@v3 |
🧰 Tools
🪛 actionlint (1.7.7)
19-19: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
22-22: the runner of "docker/setup-qemu-action@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
25-25: the runner of "docker/setup-buildx-action@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🤖 Prompt for AI Agents
In .github/workflows/v3_audit-prod.yml around lines 19 to 25, the workflow
references older action versions; update them to the latest stable releases to
avoid deprecation issues: change actions/checkout@v3 to actions/checkout@v4,
ensure docker/setup-qemu-action is pinned to its current stable tag (v2) and
bump docker/setup-buildx-action from v2 to v3 (or the latest stable release),
then run the workflow to verify compatibility and adjust any inputs if the newer
actions changed parameters.
| - name: Log into registry ${{ env.REGISTRY }} | ||
| uses: docker/login-action@v2 | ||
| with: | ||
| registry: ${{ env.REGISTRY }} | ||
| username: ${{ github.actor }} | ||
| password: ${{ secrets.GITHUB_TOKEN }} |
There was a problem hiding this comment.
Update docker/login-action to v3.
The login action is also using a deprecated runner.
- name: Log into registry ${{ env.REGISTRY }}
- uses: docker/login-action@v2
+ uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Log into registry ${{ env.REGISTRY }} | |
| uses: docker/login-action@v2 | |
| with: | |
| registry: ${{ env.REGISTRY }} | |
| username: ${{ github.actor }} | |
| password: ${{ secrets.GITHUB_TOKEN }} | |
| - name: Log into registry ${{ env.REGISTRY }} | |
| uses: docker/login-action@v3 | |
| with: | |
| registry: ${{ env.REGISTRY }} | |
| username: ${{ github.actor }} | |
| password: ${{ secrets.GITHUB_TOKEN }} |
🧰 Tools
🪛 actionlint (1.7.7)
28-28: the runner of "docker/login-action@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🤖 Prompt for AI Agents
In .github/workflows/v3_audit-prod.yml lines 27-32, the workflow uses
docker/login-action@v2 which is outdated and relies on a deprecated runner;
update the action to docker/login-action@v3 by changing the uses line to
docker/login-action@v3 and ensure the with block keeps the registry, username
and password inputs (adjust any input names to v3 if needed), and also update
the workflow's runner configuration if it's using a deprecated runner image so
it targets a supported runner.
| - name: Extract Docker metadata | ||
| id: meta | ||
| uses: docker/metadata-action@v4 | ||
| with: | ||
| images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }} | ||
|
|
||
| - name: Build and push Docker image | ||
| uses: docker/build-push-action@v5 | ||
| with: | ||
| context: . | ||
| file: apps/audit/Dockerfile | ||
| push: true | ||
| tags: ${{ steps.meta.outputs.tags }} | ||
| labels: ${{ steps.meta.outputs.labels }} |
There was a problem hiding this comment.
Update docker actions to latest versions.
Both metadata-action and build-push-action need version updates.
- name: Extract Docker metadata
id: meta
- uses: docker/metadata-action@v4
+ uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
- name: Build and push Docker image
- uses: docker/build-push-action@v5
+ uses: docker/build-push-action@v6
with:
context: .
file: apps/audit/Dockerfile
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Extract Docker metadata | |
| id: meta | |
| uses: docker/metadata-action@v4 | |
| with: | |
| images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }} | |
| - name: Build and push Docker image | |
| uses: docker/build-push-action@v5 | |
| with: | |
| context: . | |
| file: apps/audit/Dockerfile | |
| push: true | |
| tags: ${{ steps.meta.outputs.tags }} | |
| labels: ${{ steps.meta.outputs.labels }} | |
| - name: Extract Docker metadata | |
| id: meta | |
| uses: docker/metadata-action@v5 | |
| with: | |
| images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }} | |
| - name: Build and push Docker image | |
| uses: docker/build-push-action@v6 | |
| with: | |
| context: . | |
| file: apps/audit/Dockerfile | |
| push: true | |
| tags: ${{ steps.meta.outputs.tags }} | |
| labels: ${{ steps.meta.outputs.labels }} |
🧰 Tools
🪛 actionlint (1.7.7)
36-36: the runner of "docker/metadata-action@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🤖 Prompt for AI Agents
.github/workflows/v3_audit-prod.yml lines 34-47: update the docker GitHub
Actions to their current major releases — change docker/metadata-action@v4 to
the latest supported tag (e.g., docker/metadata-action@v5) and bump
docker/build-push-action@v5 to the current major release (e.g.,
docker/build-push-action@v6); ensure the with/inputs remain compatible with the
newer action versions and test the workflow to confirm tags/labels outputs are
unchanged or adjust output references if the action output names changed.
| .idea/ | ||
| tsconfig.tsbuildinfo | ||
|
|
||
| *.log |
There was a problem hiding this comment.
Remove duplicate ignore rule.
Line 38 adds *.log, but line 50 already ignores *.log. Remove this duplicate entry.
-*.log
-📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| *.log |
🤖 Prompt for AI Agents
In .gitignore around line 38, there is a duplicate ignore rule "*.log" (the same
rule also exists at line 50); remove the redundant "*.log" entry at line 38 so
the file contains only one "*.log" ignore rule (keep the existing one at line 50
and delete the duplicate at line 38).
| console.debug( | ||
| `Auto-accepted ${acceptedCount} invitations for new participant with emails:`, | ||
| emailCollection.allEmails | ||
| ) |
There was a problem hiding this comment.
PII exposure: email collection logged at debug level.
The debug log includes emailCollection.allEmails, which contains user email addresses (PII). This is the same issue as the previous occurrences.
Apply this diff to log only the count:
- console.debug(
- `Auto-accepted ${acceptedCount} invitations for new participant with emails:`,
- emailCollection.allEmails
- )
+ console.debug(
+ `Auto-accepted ${acceptedCount} invitations for new participant (${emailCollection.allEmails.length} email(s))`
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| console.debug( | |
| `Auto-accepted ${acceptedCount} invitations for new participant with emails:`, | |
| emailCollection.allEmails | |
| ) | |
| console.debug( | |
| `Auto-accepted ${acceptedCount} invitations for new participant (${emailCollection.allEmails.length} email(s))` | |
| ) |
🤖 Prompt for AI Agents
In apps/auth/src/lib/helpers.ts around lines 508 to 511, the debug log currently
prints emailCollection.allEmails (PII); change the log to output only the
acceptedCount (no email addresses). Replace the console.debug call to log a
message containing the number of auto-accepted invitations (and optional non-PII
context), and remove any reference to or interpolation of
emailCollection.allEmails to avoid PII exposure.
| redisAssessmentExec, | ||
| redisCache, | ||
| handlers, | ||
| auditClient, |
There was a problem hiding this comment.
🧩 Analysis chain
Verify prepareHatchetTasks accepts optional AuditClient.
After making auditClient optional (to prevent startup crashes), ensure that prepareHatchetTasks is typed to accept auditClient?: AuditClient and handles the undefined case gracefully.
Run this script to check the signature:
🏁 Script executed:
#!/bin/bash
# Description: Verify prepareHatchetTasks signature accepts optional auditClient
# Search for the prepareHatchetTasks function signature
ast-grep --pattern $'export function prepareHatchetTasks($$$) {
$$$
}'
# Also search for type definitions
rg -nP --type=ts 'prepareHatchetTasks.*\(' -A 5Length of output: 19871
Make auditClient optional in prepareHatchetTasks signature and handle undefined.
At packages/hatchet/src/index.ts:34, change auditClient: AuditClient to auditClient?: AuditClient. Then at line 58, guard the usage:
if (auditClient) {
await auditClient.log({
...message,
scope: AuditScope.WORKER,
})
}This prevents type errors when the caller passes undefined and ensures the createAuditLogEntry task gracefully skips logging when the client is unavailable.
🤖 Prompt for AI Agents
In apps/hatchet-worker-general/src/index.ts around line 132 (and update the
source signature at packages/hatchet/src/index.ts line 34), make the
prepareHatchetTasks parameter auditClient optional (auditClient?: AuditClient)
and update any call sites accordingly; then in the createAuditLogEntry task
where auditClient is used (around line 58), add a guard so the log call runs
only when auditClient is defined (e.g., if (auditClient) { await
auditClient.log({...}) }), ensuring the task skips logging when auditClient is
undefined.
| await ctx.auditClient.log({ | ||
| subject: `user:${ctx.user.sub}`, | ||
| action: AuditAction.USER_LOGOUT, |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Add error handling for audit logging to prevent breaking user flows.
All audit logging calls use await ctx.auditClient.log() without try-catch blocks. If the audit service is unavailable or throws an error, user operations (login, logout) will fail. Audit logging should typically be fire-and-forget or have explicit error handling to avoid disrupting primary flows.
Add error handling around audit calls:
try {
await ctx.auditClient.log({
subject: `participant:${participant.id}`,
action: AuditAction.PARTICIPANT_LOGIN_SUCCESS,
attributes: { method: 'password' },
})
} catch (error) {
console.error('Audit logging failed:', error)
// Optionally: send to monitoring/alerting
}Or, if auditClient supports a non-throwing fire-and-forget mode, use that:
// Don't await if audit client queues asynchronously
ctx.auditClient.log({...}).catch(err => console.error('Audit failed:', err))Verify the auditClient's error handling behavior and decide on the appropriate pattern. Based on learnings.
Also applies to: 120-122, 137-139, 367-369, 469-471, 901-903, 930-932, 976-978
🤖 Prompt for AI Agents
In packages/graphql/src/services/accounts.ts (apply to lines 36-38, 120-122,
137-139, 367-369, 469-471, 901-903, 930-932, 976-978): auditClient.log is
awaited directly and can throw, which may break primary user flows; change these
calls to either fire-and-forget (remove await and attach .catch to log/report
errors) or wrap each await in a try-catch that logs the error (and optionally
reports to monitoring) so audit failures do not block login/logout or other user
operations.
| // Log failed login attempt - participant not found | ||
| if (!participant) { | ||
| await ctx.auditClient.log({ | ||
| subject: `participant:${usernameOrEmail}`, | ||
| action: AuditAction.PARTICIPANT_LOGIN_FAILED, | ||
| attributes: { | ||
| method: 'password', | ||
| reason: 'user_not_found', | ||
| ip: ctx.req?.ip, | ||
| userAgent: ctx.req?.headers?.['user-agent'], | ||
| }, | ||
| }) | ||
| return null | ||
| } |
There was a problem hiding this comment.
Sanitize or hash unknown identifiers in audit log subjects.
Line 121 uses unsanitized user input (usernameOrEmail) in the subject field. If this contains special characters, extremely long strings, or sensitive data, it could cause issues in audit log queries or expose information.
Consider one of these approaches:
Option 1: Use a hash
- subject: `participant:${usernameOrEmail}`,
+ subject: `participant:unknown:${createHash('sha256').update(usernameOrEmail).digest('hex')}`,Option 2: Use a generic placeholder and move the identifier to attributes
- subject: `participant:${usernameOrEmail}`,
+ subject: `participant:unknown`,
action: AuditAction.PARTICIPANT_LOGIN_FAILED,
attributes: {
+ attemptedIdentifier: usernameOrEmail.substring(0, 100), // truncated
method: 'password',Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In packages/graphql/src/services/accounts.ts around lines 118–131, the audit log
uses the raw usernameOrEmail in the subject which can include special chars,
very long strings, or sensitive data; replace the raw identifier with a
sanitized value: compute a stable hash (e.g. SHA-256 hex) of usernameOrEmail and
use subject `participant:<hash>`, move the original identifier into attributes
only after sanitizing/truncating or masking (or omit raw entirely and store a
boolean/placeholder), ensure consistent hashing method so logs remain queryable,
and enforce a max length on any attribute to avoid extremely long values.
| const auditClient = new AuditClient() | ||
|
|
||
| return { prisma, hatchet: hatchetClient, emitter, auditClient } |
There was a problem hiding this comment.
Ensure AuditClient initialization doesn't break tests when audit service is unavailable.
The AuditClient constructor requires AUDIT_SERVICE_URL and AUDIT_TOKEN environment variables and will throw if they're missing. In test environments where the audit service may not be available, this will cause all test initialization to fail with a misleading "Database connection failed" error message.
Consider one of these approaches:
- Graceful fallback: Wrap AuditClient construction in a try-catch and provide a no-op stub:
+ let auditClient: AuditClient
+ try {
+ auditClient = new AuditClient()
+ } catch (error) {
+ console.warn('AuditClient initialization failed, using stub:', error)
+ auditClient = {
+ log: async () => {
+ console.warn('Audit logging disabled in tests')
+ return null
+ }
+ } as AuditClient
+ }
- const auditClient = new AuditClient()- Environment validation: Check for required environment variables before constructing AuditClient and fail with a clear message if missing.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In packages/graphql/test/helpers.ts around lines 372 to 374, constructing
AuditClient unconditionally causes tests to crash when
AUDIT_SERVICE_URL/AUDIT_TOKEN are missing; wrap the AuditClient construction in
a short guard: either check process.env.AUDIT_SERVICE_URL and
process.env.AUDIT_TOKEN before instantiating and throw a clear error if you want
strictness, or better for tests use a try-catch and on failure return a no-op
stub object that implements the same methods used in tests (or null with callers
adjusted) so test initialization doesn't bubble up an unrelated "Database
connection failed" error.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (13)
packages/util/src/jwt.ts (1)
12-14: LGTM with a minor observation on union types.The addition of
liveQuizId,instanceId, andexecutionfields aligns with the audit correlation tracking requirements. The fields are optional and integrate cleanly with the existing JWTPayload interface.One observation:
instanceIdandexecutionuse union types (string | number). If the actual usage consistently treats these as numbers, consider narrowing the types to avoid unnecessary type guards downstream.apps/audit/test/performance.test.ts (4)
106-122: Review performance thresholds for environment sensitivity.The 100ms threshold for simple events is reasonable, but may be sensitive to CI/test environment performance (network latency to Azurite, CPU contention, etc.). If tests become flaky in CI, consider:
- Relaxing thresholds slightly or making them configurable via environment variables
- Using percentile-based assertions (e.g., p95 < 150ms) instead of absolute values
341-341: Throughput variance threshold of 250% is extremely permissive.A throughput variance of 250% means the system can have a 3.5x difference between min and max throughput, which suggests very inconsistent performance. For a production audit service, this variance is quite high and may mask performance degradation issues.
Consider tightening this threshold (e.g., 50-100%) to catch performance regressions earlier, or document why such high variance is acceptable.
359-432: Memory leak test relies on optionalglobal.gcwhich may not be available.Lines 373-375 and 394-396 conditionally invoke
global.gc()if available. In standard Node.js,gcis only exposed when running with--expose-gcflag. Without it, garbage collection timing is unpredictable, reducing test reliability.Consider documenting this requirement in test setup instructions or adjusting the test to be more robust when GC is unavailable (e.g., longer settle times, or skip/warn when GC is not exposed).
105-527: Comprehensive performance test suite with good coverage.The test suite covers a wide range of performance scenarios (single request, concurrency, sustained load, memory, error rates, and DB scale). The use of
measurePerformancehelper and detailed console logging aids in diagnosing performance issues.A few recommended improvements:
- Make hard-coded thresholds configurable or document their assumptions
- Consider adding assertions on response latency percentiles (p50, p95, p99) for more nuanced performance tracking
- Reduce sleep/wait times where possible to speed up test execution
Overall, this is a solid foundation for load testing the audit service.
apps/audit/test/integration.test.ts (4)
12-28: Improve error messages inextractEntityTimestamp.The function throws generic error messages that don't include the entity or context. When debugging test failures, more descriptive errors would help identify which entity caused the issue.
Consider including entity identifiers (rowKey, partitionKey) in the error messages.
Apply this diff to improve error handling:
function extractEntityTimestamp(entity: any): number { const raw = entity.eventTimestamp ?? entity.timestamp if (typeof raw === 'number') { return raw } if (typeof raw === 'string') { const parsed = Date.parse(raw) if (Number.isNaN(parsed)) { - throw new Error(`Unable to parse timestamp value: ${raw}`) + throw new Error(`Unable to parse timestamp value: ${raw} for entity ${entity.rowKey}`) } return parsed } - throw new Error('Timestamp not found on entity') + throw new Error(`Timestamp not found on entity ${entity.rowKey}`) }
47-65: Code duplication:waitForEntityByRowKeyreimplementsAzureTableTestHelper.waitForEntity.The
waitForEntityByRowKeyhelper iterates over all entities to find a match byrowKey, butAzureTableTestHelperalready provideswaitForEntity(partitionKey, rowKey)which is more efficient.If you need to search by
rowKeyalone without knowing the partition key, consider adding a dedicated helper method toAzureTableTestHelperto avoid duplicating the polling logic.
125-125: Fixed sleeps slow down test execution.Multiple tests use fixed
setTimeoutdelays (500ms to 5000ms) to wait for persistence. These add up to significant test execution time. Consider:
- Reducing sleep durations where safe
- Using the existing
waitForEntityorwaitForEntityCounthelpers more consistently (they poll with shorter intervals and fail fast on timeout)- Making wait times configurable for local vs. CI environments
Also applies to: 179-179, 255-255, 300-300, 342-342, 382-382, 429-429, 476-476, 496-496, 521-521
83-504: Comprehensive integration test suite with strong coverage.The test suite thoroughly exercises the audit service across multiple dimensions: persistence, idempotency, isolation, event types, partitioning, payload size, and timestamp handling. The tests are well-structured and validate both HTTP responses and database state.
A few refinements would improve maintainability and speed:
- Reduce fixed sleep times by relying more on polling helpers
- Eliminate code duplication in helper functions
- Add more descriptive test names where applicable (e.g., line 220)
Overall, this provides excellent end-to-end verification of the audit service.
apps/audit/src/schemas/audit-event.ts (1)
62-119: Consider requiring correlation data for PUBLIC events to improve traceability.Currently, PUBLIC-scope events may omit both
correlationIdandcorrelationClaims. A previous review suggested requiring at least one for improved event correlation and traceability.If this is still the desired behavior, add this validation inside the superRefine (after line 74):
}) } + + // For PUBLIC events, require either correlationId or correlationClaims + if (scope === AuditScope.PUBLIC) { + if (!val.correlationId && !val.correlationClaims) { + ctx.addIssue({ + code: z.ZodIssueCode.custom, + message: + 'PUBLIC events must include correlationId or correlationClaims for traceability', + path: ['correlationId'], + }) + } + } if (This ensures PUBLIC events can be traced back to their originating context.
apps/audit/test/klicker-scenarios.test.ts (1)
38-67: Replace fixed delay with dynamic wait to improve test reliability.The 2-second fixed delay (line 58) could cause flakiness in CI or under load. The AzureTableTestHelper already provides
waitForEntityCountwith configurable timeout.Replace the fixed delay with:
- // Wait for persistence - await new Promise((resolve) => setTimeout(resolve, 2000)) + // Wait for persistence with timeout + await tableHelper.waitForEntityCount(events.length, 10000, 200)This will fail fast if entities aren't persisted within 10 seconds, rather than always waiting 2 seconds.
packages/types/src/index.ts (2)
972-976: StandardizeinstanceIdandexecutionto single types.
CorrelationClaimsdefinesinstanceIdandexecutionasstring | number, creating type ambiguity. Consumers must perform runtime checks or normalization, increasing complexity and the risk of type-related bugs.Consider standardizing to a single type (e.g.,
stringornumber) across the codebase. If both are genuinely required, document the rationale and provide utility functions for safe conversion.
1088-1109: Consider flexible status types for audit responses.
AuditResponseandAuditBatchResponseboth fixstatus: 'stored'as a literal type. If future requirements include partial failures, retries, or other states (e.g.,'pending','failed','partial'), this design will require breaking changes.Consider using a union type for
status:export interface AuditResponse { status: 'stored' | 'pending' | 'failed' stored: boolean eventId: string error?: string }Alternatively, document that the current design assumes all responses are successfully stored and any failures are handled via exceptions.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
.github/workflows/claude-code-review.yml(1 hunks)apps/audit/src/routes/audit-public.ts(1 hunks)apps/audit/src/schemas/audit-event.ts(1 hunks)apps/audit/src/storage/entities.ts(1 hunks)apps/audit/test/fixtures/klicker-events.json(1 hunks)apps/audit/test/integration.test.ts(1 hunks)apps/audit/test/klicker-scenarios.test.ts(1 hunks)apps/audit/test/performance.test.ts(1 hunks)apps/response-api/src/index.ts(9 hunks)packages/types/src/index.ts(1 hunks)packages/util/src/jwt.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- apps/audit/src/storage/entities.ts
- .github/workflows/claude-code-review.yml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-23T12:32:29.344Z
Learnt from: CR
PR: uzh-bf/klicker-uzh#0
File: apps/frontend-manage/CLAUDE.md:0-0
Timestamp: 2025-06-23T12:32:29.344Z
Learning: Implement error boundaries and fallback UI to handle unexpected errors gracefully in the user interface.
Applied to files:
apps/audit/src/schemas/audit-event.ts
🧬 Code graph analysis (6)
apps/audit/test/klicker-scenarios.test.ts (1)
apps/audit/test/utils/azure-table-helper.ts (1)
AzureTableTestHelper(37-320)
apps/audit/src/routes/audit-public.ts (8)
apps/audit/src/storage/table-client.ts (1)
AuditTableClient(5-286)apps/audit/src/utils/metrics.ts (1)
metrics(10-35)apps/audit/src/utils/logger.ts (1)
logger(4-26)apps/audit/src/utils/cookies.ts (1)
parseCookies(1-24)apps/audit/src/middleware/auth.ts (1)
verifyParticipantToken(39-71)apps/audit/src/utils/rate-limit.ts (1)
checkRateLimit(8-35)apps/audit/src/schemas/audit-event.ts (1)
AuditEventSchema(30-119)apps/audit/src/storage/entities.ts (1)
createAuditEntity(46-73)
apps/response-api/src/index.ts (1)
packages/types/src/index.ts (2)
InternalAuditEvent(992-996)CorrelationClaims(972-976)
apps/audit/src/schemas/audit-event.ts (1)
packages/types/src/index.ts (1)
CorrelationClaims(972-976)
apps/audit/test/performance.test.ts (1)
apps/audit/test/utils/azure-table-helper.ts (1)
AzureTableTestHelper(37-320)
apps/audit/test/integration.test.ts (1)
apps/audit/test/utils/azure-table-helper.ts (1)
AzureTableTestHelper(37-320)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: build
- GitHub Check: format
- GitHub Check: cypress-run-cloud
- GitHub Check: build
- GitHub Check: build
- GitHub Check: SonarCloud
- GitHub Check: build
- GitHub Check: build
- GitHub Check: Analyze (javascript)
- GitHub Check: build
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: check
🔇 Additional comments (40)
apps/audit/test/fixtures/klicker-events.json (1)
1-313: Comprehensive test fixture with good event coverage.The fixture provides a wide range of realistic audit events covering instructor controls, participant authentication flows, quiz interactions, system processing, security events, and error scenarios. This will enable thorough testing of event handling, correlation, and persistence logic.
The structure is clear and consistent across all event types, making it easy to extend with additional scenarios as needed.
apps/audit/src/schemas/audit-event.ts (6)
8-16: LGTM: Comprehensive public action allowlist.The ALLOWED_PUBLIC_ACTIONS array covers key participant interactions and client error reporting, providing a clear security boundary for public-scope events.
18-22: LGTM: Well-defined constants.The Set for O(1) action lookups, 32KB attribute limit (matching Azure Table Storage constraints), and correlation ID regex are all appropriate.
24-28: LGTM: Schema matches interface contract.CorrelationClaimsSchema correctly validates the CorrelationClaims interface from packages/types, with appropriate union types for instanceId and execution.
87-106: LGTM: Robust attribute validation with clear error messages.The validation correctly enforces JSON serializability and the 32KB size limit, with helpful error messages indicating current size on limit violations.
108-118: LGTM: Correlation claims serializability check.The validation ensures correlationClaims can be JSON-serialized before storage, with a clear error path.
46-46: Subject must be optional to allow PUBLIC-scope events.The base schema requires
subject, but PUBLIC-scope events need to allow omission since they're enriched server-side. The superRefine at lines 76-85 attempts to enforce subject presence for non-PUBLIC scopes, but it cannot relax a field already marked as required.Apply this diff:
- subject: z.string().min(1).max(500), + subject: z.string().min(1).max(500).optional(),The existing superRefine will then correctly enforce subject presence for non-PUBLIC scopes only.
apps/audit/test/klicker-scenarios.test.ts (8)
1-26: LGTM: Clean test setup with proper authentication helper.The test configuration and authenticated request helper are well-structured, reading credentials from environment variables.
69-85: LGTM: Robust timestamp extraction with proper error handling.The helper correctly handles multiple timestamp formats (number, ISO string) and provides clear error messages for invalid values.
87-101: LGTM: Proper test lifecycle management.The setup and cleanup hooks ensure test isolation by clearing the table before and after tests.
103-316: LGTM: Comprehensive workflow test with strong assertions.This test validates the complete quiz lifecycle including multiple authentication methods, response processing, and correlation tracking. The timestamp ordering and correlation verification assertions are particularly valuable.
318-447: LGTM: Thorough modification tracking tests.The tests correctly verify that response modifications and deletions capture the actor, reason, and original subject, ensuring a complete audit trail.
449-534: LGTM: Complete authentication scenario coverage.The tests validate successful and failed authentication flows across multiple methods (magic link, LTI, temp login), ensuring audit events capture the full authentication lifecycle.
536-658: LGTM: Strong exam integrity monitoring coverage.The tests validate detection and logging of security-relevant events (multiple tabs, focus loss, IP changes), which are critical for assessment integrity.
660-827: LGTM: Comprehensive error handling test coverage.The tests validate error scenarios including validation failures, duplicates, correlation errors, and client errors, ensuring the audit system handles edge cases gracefully.
packages/types/src/index.ts (3)
978-996: LGTM: Clean audit event type hierarchy.The base/derived event structure cleanly separates public and internal scopes, with
InternalAuditEventappropriately requiringsubjectwhilePublicAuditEventfixes the scope. The flexibleattributesfield allows for context-specific data.
998-1002: LGTM: AuditScope enum is well-defined.The three scopes (
PUBLIC,INTERNAL,WORKER) provide clear separation for different audit contexts and filtering needs.
1004-1086: Track implementation progress for audit actions.All
AuditActionenum values are marked with// TODOcomments, indicating incomplete implementation. Additionally, the security events (MULTIPLE_TABS_DETECTED,BROWSER_FOCUS_LOST,IP_LOCATION_CHANGE) include a note expressing uncertainty about implementation.Ensure that:
- Implementation status is tracked for all audit actions
- Security event detection mechanisms are clarified before full rollout
- Remove TODO comments as actions are implemented and tested
Consider creating a tracking issue to monitor completion of audit action implementation.
apps/response-api/src/index.ts (11)
281-286: LGTM: Correlation claims normalization.The code constructs
validatedCorrelationClaimsafter successful validation, providing a typed structure for downstream audit events. Defaultingexecutionto0when undefined is a reasonable fallback.Ensure that the default
execution: 0aligns with business logic expectations.
176-186: Critical: Don't stringifyreq; add scope; make push non-blocking.This audit event logs the entire
IncomingMessageobject viaJSON.stringify(req), which leaks headers/cookies and can be circular or huge. Additionally, thescopefield is missing, and the blockingawaitcan derail the error path if the audit push fails.Apply the sanitization and error handling as suggested in the previous review comment at lines 176-187.
192-202: Major: Avoid logging full body; add scope and guard push.The audit event embeds the raw payload via
JSON.stringify(payload), which may contain sensitive data. Thescopefield is also missing, and the push is blocking.Apply the sanitization as suggested in the previous review comment at lines 192-203.
214-226: Major: Do not log raw JWT (correlationKey); hash it and remove full response.The audit event logs the raw
correlationKey(a JWT) and the fullresponseobject, both of which are sensitive. Thescopefield is missing, and the push is blocking.Apply the hashing and sanitization as suggested in the previous review comment at lines 214-227. Note that
correlationKeyis not the same as the computedcorrelationId(line 218 comment).
240-252: Major: Sanitize correlation error log; hash token; no full payload.Raw
correlationKeyand fullresponseare logged. Thescopefield is missing, and the push is blocking.Apply the hashing and sanitization as suggested in the previous review comment at lines 240-253.
262-276: Major: Invalid correlation log still leaks token; hash and trim details.Raw
correlationKeyand fullresponseare exposed in the audit event. Thescopefield is missing, and the push is blocking.Apply the hashing and sanitization as suggested in the previous review comment at lines 262-277.
313-326: Major: Auth error - hashcorrelationId; don't log payload.Raw
correlationKeyand fullresponseare logged. Thescopefield is missing, and the push is blocking.Apply the hashing and sanitization as suggested in the previous review comment at lines 313-327.
334-347: Major: Don't log raw cookies; hashcorrelationId; add minimal flags.The audit event logs raw cookie strings (PII) and the full
response. Thescopefield is missing, and the push is blocking.Apply the sanitization as suggested in the previous review comment at lines 334-348, replacing raw cookies with boolean flags.
361-374: Major: Use hashedcorrelationId; avoid logging full response.This is the only audit event with
scope: AuditScope.INTERNAL(good!), but it still uses the rawcorrelationKeyinstead of the computed hashedcorrelationId(line 358). The fullresponseobject is logged, and the push is blocking.Apply the changes suggested in the previous review comment at lines 361-374, using the hashed
correlationIdand a response fingerprint instead of the raw payload. Based on past comment at line 366.
385-397: Major: Duplicate - don't log raw token; use hashed id consistently.The audit event uses raw
correlationKeyin thecorrelationIdfield but references the hashedcorrelationIdin the error message, creating confusion. Thescopefield is missing, and the push is blocking.Apply the changes suggested in the previous review comment at lines 385-397, using the hashed
correlationIdconsistently and adding thescopefield.
425-437: Major: Fallback audit - avoid raw token and standardize attributes.The fallback audit event uses raw
correlationKeyinstead of the hashedcorrelationId. Thescopefield is missing, and this push also lacks.catch()(though there is an outer try/catch).Apply the changes suggested in the previous review comment at lines 425-437, using the hashed
correlationIdand adding thescopefield with structured attributes.apps/audit/src/routes/audit-public.ts (11)
1-15: LGTM!Imports and dependency interface are well-structured and appropriate for the route handler.
17-24: LGTM!Function signature follows good dependency injection practices, and initial metrics tracking is appropriate.
62-82: LGTM!Rate limiting implementation is correct. As noted in previous discussions, the 100 requests/minute threshold may need tuning based on actual usage patterns.
84-97: LGTM!JSON parsing with proper error handling and appropriate 400 response for malformed input.
99-123: LGTM!Payload normalization correctly enforces security boundaries by ignoring client-provided
subjectandscope, and enriches the event with authenticated participant context.
125-144: LGTM!Schema validation is properly implemented with
safeParse, appropriate error logging, and user-friendly error messages. Correctly marks validation errors as non-retryable.
146-155: LGTM!Optional correlation ID logging provides useful tracing capability for related events.
157-167: LGTM!Comprehensive logging before storage provides good audit trail and debugging support.
169-194: LGTM!Storage operation is clean and properly instrumented. Metrics are updated after success, and the response provides the
eventIdfor client-side correlation.
195-251: LGTM!Error handling is comprehensive and user-friendly. Properly classifies errors by type (size limits vs. transient failures) and provides appropriate HTTP status codes with retry guidance. Error patterns align with Azure Table Storage failure modes.
36-47: Accept alternative participant token cookies.The endpoint currently only accepts
next-auth.participant-session-token. As noted in previous reviews, temporary participants may use different cookie names (e.g.,temporary_participant_token), preventing their actions from being audited.
…-log # Conflicts: # packages/graphql/test/docker/docker-compose.test.yml # packages/graphql/test/helpers.ts
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (9)
apps/audit/.env.cypress (1)
19-19: Weak secrets are acceptable for Cypress tests but document the constraint.The simple values
test-secret-token-123(line 19) andabcd(line 22) are appropriate for a.env.cypresstest file. However, consider adding a comment emphasizing these are test-only values and must never be used in production environments.Apply this diff to add clarifying comments:
# Authentication Configuration +# WARNING: Test-only value. Use strong secrets in production. AUDIT_TOKEN=test-secret-token-123 # JWT verification for participant tokens (public endpoint) +# WARNING: Test-only value. Use strong secrets in production. APP_SECRET=abcdAlso applies to: 22-22
apps/audit/test/integration.test.ts (1)
12-65: Extract shared test helpers to reduce duplication.The functions
extractEntityTimestamp,makeAuthenticatedRequest, andwaitForEntityByRowKeyare duplicated across multiple test files (integration.test.ts, performance.test.ts, and database-verification.test.ts).Consider extracting these to a shared test utilities file (e.g.,
apps/audit/test/utils/test-helpers.ts) to improve maintainability and ensure consistency.Create a new file
apps/audit/test/utils/test-helpers.ts:export function extractEntityTimestamp(entity: any): number { const raw = entity.eventTimestamp ?? entity.timestamp if (typeof raw === 'number') { return raw } if (typeof raw === 'string') { const parsed = Date.parse(raw) if (Number.isNaN(parsed)) { throw new Error(`Unable to parse timestamp value: ${raw}`) } return parsed } throw new Error('Timestamp not found on entity') } export async function makeAuthenticatedRequest( baseUrl: string, authToken: string, path: string, options: RequestInit = {} ): Promise<Response> { const url = `${baseUrl}${path}` const response = await fetch(url, { ...options, headers: { 'X-Internal-Token': authToken, 'Content-Type': 'application/json', ...options.headers, }, }) return response } export async function waitForEntityByRowKey( tableHelper: AzureTableTestHelper, rowKey: string, timeoutMs = 5000, pollIntervalMs = 100 ) { const start = Date.now() while (Date.now() - start < timeoutMs) { const entities = await tableHelper.getAllEntities() const match = entities.find((entity) => entity.rowKey === rowKey) if (match) { return match } await new Promise((resolve) => setTimeout(resolve, pollIntervalMs)) } throw new Error(`Entity with rowKey ${rowKey} not found after ${timeoutMs}ms`) }Then import and use these in all test files.
apps/audit/test/performance.test.ts (7)
125-125: Consider using robust wait mechanisms for persistence verification.Lines 125 and 179 use fixed delays to wait for persistence. While these occur after performance measurements (so they don't affect timing), they could still lead to flaky tests. Consider using
tableHelper.waitForEntityCount()or a similar polling mechanism.For example, at line 125:
- await new Promise((resolve) => setTimeout(resolve, 500)) + await tableHelper.waitForEntityCount(1, 5000)Also applies to: 179-179
259-259: Use waitForEntityCount instead of fixed delay.Replace the fixed 5-second delay with
tableHelper.waitForEntityCount()to robustly wait for entity persistence.- await new Promise((resolve) => setTimeout(resolve, 5000)) // Extra wait time + await tableHelper.waitForEntityCount(successfulResponses.length, 20000)Note: Adjust timeout as needed based on expected persistence time.
340-341: Performance variance threshold may be too permissive.A 250% throughput variance allows maximum throughput to be 3.5× minimum, which may not effectively detect performance degradation over sustained load. Consider tightening this threshold (e.g., 50-100%) to catch more subtle performance issues.
- // Performance should not degrade significantly over time - expect(throughputVariance).toBeLessThanOrEqual(250) + // Performance should not degrade significantly over time + expect(throughputVariance).toBeLessThanOrEqual(100)
348-348: Use waitForEntityCount instead of fixed delay.Replace the fixed 3-second delay with
tableHelper.waitForEntityCount(totalEvents, 15000)for robust persistence verification.- await new Promise((resolve) => setTimeout(resolve, 3000)) + await tableHelper.waitForEntityCount(totalEvents, 15000)
392-392: Consider using waitForEntityCount for persistence verification.Lines 392 and 425 use fixed delays. While line 392's delay might be intentional for iteration spacing, line 425 should use
tableHelper.waitForEntityCount(totalExpectedEvents, 15000)for more reliable persistence verification.- await new Promise((resolve) => setTimeout(resolve, 2000)) + await tableHelper.waitForEntityCount(totalExpectedEvents, 15000)Also applies to: 425-425
476-476: Use waitForEntityCount instead of fixed delay.Replace the fixed 2-second delay with
tableHelper.waitForEntityCount(successCount, 10000)for robust persistence verification.- await new Promise((resolve) => setTimeout(resolve, 2000)) + await tableHelper.waitForEntityCount(successCount, 10000)
496-496: Use waitForEntityCount instead of fixed delays.Both delays should use
tableHelper.waitForEntityCount()for more reliable persistence verification.At line 496:
- await new Promise((resolve) => setTimeout(resolve, 5000)) // Wait for population + await tableHelper.waitForEntityCount(populateCount, 20000)At line 521:
- await new Promise((resolve) => setTimeout(resolve, 1000)) + await tableHelper.waitForEntityCount(existingCount + 1, 10000)Also applies to: 521-521
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
apps/audit/.env.cypress(1 hunks)apps/audit/test/api.test.ts(1 hunks)apps/audit/test/database-verification.test.ts(1 hunks)apps/audit/test/integration.test.ts(1 hunks)apps/audit/test/performance.test.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- apps/audit/test/api.test.ts
🧰 Additional context used
🧬 Code graph analysis (3)
apps/audit/test/performance.test.ts (1)
apps/audit/test/utils/azure-table-helper.ts (1)
AzureTableTestHelper(37-320)
apps/audit/test/database-verification.test.ts (1)
apps/audit/test/utils/azure-table-helper.ts (1)
AzureTableTestHelper(37-320)
apps/audit/test/integration.test.ts (1)
apps/audit/test/utils/azure-table-helper.ts (1)
AzureTableTestHelper(37-320)
🪛 dotenv-linter (3.3.0)
apps/audit/.env.cypress
[warning] 6-6: [UnorderedKey] The NODE_ENV key should go before the PORT key
(UnorderedKey)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: test
- GitHub Check: build
- GitHub Check: cypress-run-cloud
- GitHub Check: build
- GitHub Check: test
- GitHub Check: Analyze (javascript)
- GitHub Check: SonarCloud
- GitHub Check: build
- GitHub Check: build
- GitHub Check: test
- GitHub Check: format
- GitHub Check: build
- GitHub Check: check
🔇 Additional comments (1)
apps/audit/.env.cypress (1)
1-31: LGTM! Cypress test configuration is well-structured.The environment configuration for Cypress testing is comprehensive and appropriate:
- Uses Azurite's well-known default connection string for local testing
- Includes clear comments distinguishing local vs. production setup
- Covers all necessary configuration domains (server, storage, auth, CORS, logging)
| responses.forEach((response) => expect(response.status).toBe(200)) | ||
|
|
||
| // Wait and verify only one record exists | ||
| await new Promise((resolve) => setTimeout(resolve, 2000)) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Use waitForEntityCount instead of fixed delay.
Replace the fixed 2-second delay with tableHelper.waitForEntityCount() to reliably wait for persistence before verification.
- await new Promise((resolve) => setTimeout(resolve, 2000))
+ await tableHelper.waitForEntityCount(1, 10000)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| await new Promise((resolve) => setTimeout(resolve, 2000)) | |
| await tableHelper.waitForEntityCount(1, 10000) |
🤖 Prompt for AI Agents
In apps/audit/test/database-verification.test.ts around line 141, replace the
fixed 2-second delay (await new Promise(resolve => setTimeout(resolve, 2000)))
with a call to tableHelper.waitForEntityCount() so the test waits
deterministically for persistence: call
tableHelper.waitForEntityCount(expectedCount[, options]) using the expected
number of entities you expect to be persisted (and optionally a timeout/retry
options), then proceed to verification; remove the setTimeout and ensure
tableHelper is available/imported in the test scope.
| expect(response.status).toBe(200) | ||
|
|
||
| // Wait for persistence and get the entity | ||
| await new Promise((resolve) => setTimeout(resolve, 1000)) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Use waitForEntityCount instead of fixed delays.
Multiple tests use fixed 1-2 second delays. Replace these with tableHelper.waitForEntityCount() for more reliable persistence verification.
For example, at line 169:
- await new Promise((resolve) => setTimeout(resolve, 1000))
+ await tableHelper.waitForEntityCount(1, 10000)Apply similar changes at lines 213 and 249 with appropriate expected counts.
Also applies to: 213-213, 249-249
🤖 Prompt for AI Agents
In apps/audit/test/database-verification.test.ts around lines 169, 213 and 249,
replace the fixed 1s setTimeout delays with calls to
tableHelper.waitForEntityCount(...) so the test waits for the expected number of
persisted rows rather than a fixed sleep; for each location determine the
expectedCount based on the number of entities inserted earlier in that test
section and call await tableHelper.waitForEntityCount('<tableName>',
expectedCount) (include an optional timeout argument if available) to make
persistence assertions robust and remove the hard-coded setTimeout lines.
| } | ||
|
|
||
| // Verify row keys | ||
| await new Promise((resolve) => setTimeout(resolve, 1000)) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Use waitForEntityCount instead of fixed delays.
Replace fixed delays with tableHelper.waitForEntityCount() for robust persistence verification.
At line 287:
- await new Promise((resolve) => setTimeout(resolve, 1000))
+ await tableHelper.waitForEntityCount(2, 10000)At line 319:
- await new Promise((resolve) => setTimeout(resolve, 1000))
+ await tableHelper.waitForEntityCount(1, 10000)Also applies to: 319-319
🤖 Prompt for AI Agents
In apps/audit/test/database-verification.test.ts around lines 287 and 319, the
test uses fixed delays (await new Promise(resolve => setTimeout(resolve, 1000)))
which is brittle; replace each fixed delay with
tableHelper.waitForEntityCount(tableName, expectedCount[, timeout]) where
tableName and expectedCount match the table and assertion that follows those
waits (use the same expected count used later in the test), remove the
setTimeout calls, and await the tableHelper.waitForEntityCount promise so the
test reliably waits for persistence instead of a fixed delay.
| expect(response.status).toBe(200) | ||
|
|
||
| // Verify serialization integrity | ||
| await new Promise((resolve) => setTimeout(resolve, 1000)) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Use waitForEntityCount instead of fixed delays.
Replace fixed 1-second delays with tableHelper.waitForEntityCount(1, 10000) for more reliable persistence verification.
- await new Promise((resolve) => setTimeout(resolve, 1000))
+ await tableHelper.waitForEntityCount(1, 10000)Also applies to: 396-396
🤖 Prompt for AI Agents
In apps/audit/test/database-verification.test.ts around lines 359 and 396,
replace the fixed 1-second Promise delay (await new Promise((resolve) =>
setTimeout(resolve, 1000))) with a call to tableHelper.waitForEntityCount(1,
10000) to wait deterministically for the entity to persist; ensure you await the
promise returned by waitForEntityCount and remove the setTimeout lines so the
test waits up to 10 seconds for one entity instead of a fixed delay.
| } | ||
|
|
||
| // Verify timestamp storage | ||
| await new Promise((resolve) => setTimeout(resolve, 1000)) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Use waitForEntityCount instead of fixed delays.
Replace fixed 1-second delays with appropriate tableHelper.waitForEntityCount() calls.
At line 442:
- await new Promise((resolve) => setTimeout(resolve, 1000))
+ await tableHelper.waitForEntityCount(2, 10000)At line 479:
- await new Promise((resolve) => setTimeout(resolve, 1000))
+ await tableHelper.waitForEntityCount(1, 10000)Also applies to: 479-479
🤖 Prompt for AI Agents
In apps/audit/test/database-verification.test.ts around lines 442 and 479,
replace the fixed 1-second setTimeout pauses with calls to
tableHelper.waitForEntityCount() to wait deterministically for the expected
number of entities; for each location, call
tableHelper.waitForEntityCount(tableName, expectedCount, { timeoutMs:
<reasonableTimeout> }) (or the project’s standard timeout) so the test waits
until the DB contains the expected rows instead of using a hardcoded delay.
| // Wait for persistence | ||
| await new Promise((resolve) => setTimeout(resolve, 1000)) | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Use robust wait mechanism instead of fixed delay.
The fixed 1-second delay may lead to flaky tests if persistence takes longer than expected. Consider using tableHelper.waitForEntityCount(2, 10000) instead to wait for a specific entity count with a timeout.
Apply this diff:
- // Wait for persistence
- await new Promise((resolve) => setTimeout(resolve, 1000))
+ // Wait for persistence
+ await tableHelper.waitForEntityCount(2, 10000)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Wait for persistence | |
| await new Promise((resolve) => setTimeout(resolve, 1000)) | |
| // Wait for persistence | |
| await tableHelper.waitForEntityCount(2, 10000) |
🤖 Prompt for AI Agents
In apps/audit/test/integration.test.ts around lines 254 to 256, replace the
fixed 1-second sleep used to wait for persistence with a robust wait call that
polls until the expected entity count is reached or a timeout occurs;
specifically, remove the await new Promise(resolve => setTimeout(resolve, 1000))
and call tableHelper.waitForEntityCount(2, 10000) (or similar helper) so the
test waits up to 10s for two entities instead of a brittle fixed delay, ensuring
the appropriate helper is imported/available in the test file.
| // Wait for all events to be persisted | ||
| await new Promise((resolve) => setTimeout(resolve, 2000)) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Replace fixed delays with robust wait mechanisms.
Multiple tests use fixed 2-second delays (lines 300, 342, 382) which may cause flaky tests. Use tableHelper.waitForEntityCount() with appropriate expected counts and timeouts instead.
For example, at line 300:
- // Wait for all events to be persisted
- await new Promise((resolve) => setTimeout(resolve, 2000))
+ // Wait for all events to be persisted
+ await tableHelper.waitForEntityCount(3, 10000)Apply similar changes at lines 342 and 382.
Also applies to: 341-342, 381-382
🤖 Prompt for AI Agents
In apps/audit/test/integration.test.ts around lines 299-300, 341-342 and
381-382, replace the fixed 2-second setTimeout waits used to "Wait for all
events to be persisted" with calls to tableHelper.waitForEntityCount() using the
appropriate expected row counts for the test case and a sensible timeout (e.g.,
5-10s). Specifically, remove the new Promise(setTimeout(…,2000)) and instead
call await tableHelper.waitForEntityCount(tableName, expectedCount, timeoutMs)
(or the test suite's equivalent), choosing expectedCount based on how many
events the test should have persisted and a timeout long enough to avoid flakes;
apply the same replacement at the other two locations.
| // Wait for persistence | ||
| await new Promise((resolve) => setTimeout(resolve, 3000)) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Use robust wait mechanism instead of fixed delay.
Replace the fixed 3-second delay with tableHelper.waitForEntityCount(5, 10000) to ensure all events are persisted before verifying partition distribution.
- // Wait for persistence
- await new Promise((resolve) => setTimeout(resolve, 3000))
+ // Wait for persistence
+ await tableHelper.waitForEntityCount(5, 10000)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Wait for persistence | |
| await new Promise((resolve) => setTimeout(resolve, 3000)) | |
| // Wait for persistence | |
| await tableHelper.waitForEntityCount(5, 10000) |
🤖 Prompt for AI Agents
In apps/audit/test/integration.test.ts around lines 428 to 429, replace the
brittle fixed 3-second sleep used to wait for persistence with a robust waiter;
call tableHelper.waitForEntityCount(5, 10000) instead of awaiting new
Promise(resolve => setTimeout(resolve, 3000)) so the test waits up to 10s for 5
persisted entities before proceeding to verify partition distribution.
…-log # Conflicts: # .github/workflows/test-graphql.yml
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/test-graphql.yml (1)
163-191: Include Azure Blob storage credentials in CI workflow.The
WORKER_ENV_FILEin.github/workflows/test-graphql.ymlmust define theBLOB_STORAGE_ACCOUNT_NAMEandBLOB_STORAGE_ACCESS_KEYvariables required byStorageSharedKeyCredentialinpackages/graphql/src/services/elements.ts. For local Azurite tests, add for example:BLOB_STORAGE_ACCOUNT_NAME=devstoreaccount1 BLOB_STORAGE_ACCESS_KEY=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/workflows/test-graphql.yml(3 hunks)packages/graphql/test/run-tests-local.sh(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/graphql/test/run-tests-local.sh
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: check
- GitHub Check: build
- GitHub Check: Analyze (javascript)
- GitHub Check: cypress-run-cloud
- GitHub Check: build
- GitHub Check: test
- GitHub Check: test
| services: | ||
| postgres: | ||
| image: postgres:15 | ||
| image: postgres:14.18 |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
Verify Postgres version downgrade doesn't break existing functionality.
The Postgres service image has been downgraded from postgres:15 to postgres:14.18. Ensure this version change is intentional and that all GraphQL package tests remain compatible with Postgres 14.18, particularly regarding any SQL features or syntax that might differ between versions.
🏁 Script executed:
#!/bin/bash
# Description: Search for Postgres version requirements or version-specific features in the codebase
# Check for any Postgres version specifications in package files
fd -e json -e yml -e yaml . | xargs rg -l "postgres.*version|postgresql.*version" -i
# Look for any migration files or SQL that might use Postgres 15-specific features
rg -g '*.sql' -g '*.prisma' "GENERATED|STORED|GENERATED ALWAYS" -C 2
# Check if there are comments or documentation about Postgres version requirements
rg "postgres.*1[45]|postgresql.*1[45]" -i -C 2Length of output: 2956
Ensure consistent Postgres version across all configs.
.github/workflows/test-graphql.yml and docker-compose.yml pin postgres:14.18, but packages/graphql/test/docker/docker-compose.test.yml and apps/olat-api/test/docker-compose.test.yml use postgres:14.8 (and the Hatchet workflow still uses postgres:15). Align every Postgres image to the same patch version (e.g., 14.18) to avoid environment drift and test inconsistencies.
🤖 Prompt for AI Agents
In .github/workflows/test-graphql.yml around line 28, the workflow pins image:
postgres:14.18 while other configs
(packages/graphql/test/docker/docker-compose.test.yml and
apps/olat-api/test/docker-compose.test.yml) pin postgres:14.8 and the Hatchet
workflow uses postgres:15; update those files to use the same patch version as
this workflow (change all postgres image references to postgres:14.18) so every
CI/workspace/docker-compose/test config uses a consistent Postgres image, and
run a quick CI test to confirm no environment drift.
| azurite: | ||
| image: mcr.microsoft.com/azure-storage/azurite:latest | ||
| ports: | ||
| - 10000:10000 | ||
| - 10001:10001 | ||
| - 10002:10002 |
There was a problem hiding this comment.
Add health checks and pin Azurite version.
The Azurite service is missing health checks, which could cause tests to start before the service is ready. Additionally, using the :latest tag reduces CI reproducibility.
Apply this diff to add health checks:
azurite:
image: mcr.microsoft.com/azure-storage/azurite:latest
+ options: >-
+ --health-cmd "curl -f http://localhost:10000"
+ --health-interval 10s
+ --health-timeout 5s
+ --health-retries 5
ports:
- 10000:10000
- 10001:10001
- 10002:10002Additionally, consider pinning to a specific Azurite version for reproducibility:
- image: mcr.microsoft.com/azure-storage/azurite:latest
+ image: mcr.microsoft.com/azure-storage/azurite:3.32.0📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| azurite: | |
| image: mcr.microsoft.com/azure-storage/azurite:latest | |
| ports: | |
| - 10000:10000 | |
| - 10001:10001 | |
| - 10002:10002 | |
| azurite: | |
| image: mcr.microsoft.com/azure-storage/azurite:3.32.0 | |
| options: >- | |
| --health-cmd "curl -f http://localhost:10000" | |
| --health-interval 10s | |
| --health-timeout 5s | |
| --health-retries 5 | |
| ports: | |
| - 10000:10000 | |
| - 10001:10001 | |
| - 10002:10002 |
🤖 Prompt for AI Agents
.github/workflows/test-graphql.yml lines 111-116: the Azurite service uses the
:latest tag and has no healthcheck, which can make CI flaky and
non-reproducible; change the image to a pinned version (e.g.,
mcr.microsoft.com/azure-storage/azurite:3.29.0) and add a healthcheck block that
polls the blob endpoint (http://localhost:10000/devstoreaccount1/) with an HTTP
or curl command, setting reasonable interval (e.g., 10s), timeout (e.g., 5s),
and retries (e.g., 5) so tests wait until Azurite is ready; also update any
depends_on or service wait logic to use the service health status
(service_healthy) so test steps don’t start before Azurite is up.




Summary by CodeRabbit