Skip to content

Commit e3d3c13

Browse files
KryptosAIclaude
andauthored
fix: audit fixes — 9 issues resolved (#82)
MCP server tools, telemetry wiring, matrix comment, README docs, CLI tests, security tests, magic number comments. 302/302 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 4ac7735 commit e3d3c13

13 files changed

Lines changed: 319 additions & 21 deletions

File tree

README.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,12 @@ Or add it manually to your config:
8181
| `watch <config>` | Watch a server for changes, alert on regressions |
8282
| `suggest` | Detect your stack and recommend MCP servers from the registry |
8383
| `serve` | Start as an MCP server for AI agents |
84+
| `lock` | Snapshot MCP server schemas into a lock file |
85+
| `lock verify` | Verify live servers match the lock file |
86+
| `history` | Show health score trends for your MCP servers |
87+
| `ci-report` | Generate CI report for GitHub issue creation |
88+
| `score <cmd>` | Score an MCP server's health (0-100) |
89+
| `badge <cmd>` | Generate an SVG health score badge for README |
8490

8591
Run with no arguments for an interactive menu:
8692

@@ -162,8 +168,41 @@ jobs:
162168
security: true
163169
```
164170
171+
Action inputs:
172+
173+
| Input | Description | Default |
174+
|-------|-------------|---------|
175+
| `command` | Server command to test | (required if no `target`) |
176+
| `target` | Path to target config JSON | |
177+
| `targets` | Path to MCP config file for multi-server matrix scan | |
178+
| `deep` | Also invoke safe tools | `false` |
179+
| `security` | Run security analysis | `false` |
180+
| `fail-on-regression` | Fail the action on issues | `true` |
181+
| `comment-on-pr` | Post report as PR comment | `true` |
182+
| `set-status` | Set a commit status check (green/red) on the HEAD SHA | `true` |
183+
| `github-token` | Token for PR comments and commit statuses | `${{ github.token }}` |
184+
165185
The action runs checks on every PR, comments a markdown report, and blocks merge on regressions. See [`action/README.md`](./action/README.md) for all options.
166186

187+
### Lock Files
188+
189+
```bash
190+
$ npx @kryptosai/mcp-observatory lock # Snapshot all server schemas
191+
$ npx @kryptosai/mcp-observatory lock verify # Verify no drift since last lock
192+
```
193+
194+
### Trend Tracking
195+
196+
```bash
197+
$ npx @kryptosai/mcp-observatory history # Show health trends over time
198+
```
199+
200+
### Nightly Scans
201+
202+
```bash
203+
$ npx @kryptosai/mcp-observatory ci-report # Generate regression report for CI
204+
```
205+
167206
## MCP Server Mode
168207

169208
**No other testing tool is itself an MCP server.** Add Observatory as a server and your AI agent can autonomously test, diagnose, and monitor your other MCP servers.

action/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,9 @@ jobs:
2929
| `security` | Run security analysis | `false` |
3030
| `fail-on-regression` | Fail the action on issues | `true` |
3131
| `comment-on-pr` | Post report as PR comment | `true` |
32-
| `github-token` | Token for PR comments | `${{ github.token }}` |
32+
| `set-status` | Set a commit status check (green/red) on the HEAD SHA | `true` |
33+
| `targets` | Path to MCP config file for multi-server matrix scan | |
34+
| `github-token` | Token for PR comments and commit statuses | `${{ github.token }}` |
3335
| `node-version` | Node.js version | `22` |
3436

3537
## Outputs

api/src/worker.ts

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -134,9 +134,9 @@ interface RunArtifact {
134134
fatalError?: string;
135135
}
136136

137-
// ---------------------------------------------------------------------------
138-
// Score computation (ported from src/score.ts)
139-
// ---------------------------------------------------------------------------
137+
// ── Score computation (duplicated from src/score.ts) ────────────────────────
138+
// IMPORTANT: This logic is duplicated from src/score.ts because the Worker
139+
// can't import from the main package. Keep both files in sync when making changes.
140140

141141
const STATUS_SCORES: Record<string, number> = {
142142
pass: 100,
@@ -214,6 +214,8 @@ function scorePerformance(
214214
);
215215
const p95 = sorted[p95Index] ?? 0;
216216

217+
// p95 latency thresholds for performance scoring
218+
// <500ms = excellent (100), <1s = good (80), <2s = acceptable (60), <5s = slow (40), >5s = poor (20)
217219
let score: number;
218220
if (p95 < 500) score = 100;
219221
else if (p95 < 1000) score = 80;
@@ -237,11 +239,11 @@ function computeHealthScore(
237239
performanceMetrics?: PerformanceMetrics,
238240
): HealthScore {
239241
const w = {
240-
protocolCompliance: 0.3,
241-
schemaQuality: 0.2,
242-
security: 0.2,
243-
reliability: 0.2,
244-
performance: 0.1,
242+
protocolCompliance: 0.3, // Highest — spec compliance is foundational for interop
243+
schemaQuality: 0.2, // Good schemas enable AI agents to use tools correctly
244+
security: 0.2, // Parity with quality — both critical for production use
245+
reliability: 0.2, // Tools/prompts/resources actually responding as expected
246+
performance: 0.1, // Lowest — latency matters less than correctness
245247
};
246248

247249
const dimensions: ScoreDimension[] = [

github-app/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# MCP Observatory GitHub App
22

3+
> **Status**: Planned feature — not yet deployed. This is the future hosted Observatory GitHub App.
4+
35
A GitHub App that automatically analyzes MCP server configurations in pull requests and posts health score reports as PR comments.
46

57
## Setup

src/commands/ci-report.ts

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ import { readdir, readFile } from "node:fs/promises";
22
import path from "node:path";
33
import type { Command } from "commander";
44
import type { RunArtifact } from "../types.js";
5+
import { buildEvent, recordEvent } from "../telemetry.js";
56
import { validateRunArtifact } from "../validate.js";
67
import { defaultRunsDirectory } from "../storage.js";
78

@@ -96,6 +97,13 @@ export function registerCiReportCommands(program: Command): void {
9697
process.stdout.write(JSON.stringify(report, null, 2) + "\n");
9798
}
9899

100+
recordEvent(buildEvent("command_complete", "ci-report", "cli", {
101+
nightlyScan: true,
102+
issueCreated: report.hasRegressions,
103+
matrixServerCount: report.serverCount,
104+
matrixFailCount: report.failCount,
105+
}));
106+
99107
if (report.hasRegressions) {
100108
process.exitCode = 1;
101109
}

src/commands/history.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import type { Command } from "commander";
22
import { readHistory, getTrend, renderTrendLabel } from "../history.js";
3+
import { buildEvent, recordEvent } from "../telemetry.js";
34
import { ANSI, c } from "./helpers.js";
45

56
export function registerHistoryCommands(program: Command): void {
@@ -58,5 +59,9 @@ export function registerHistoryCommands(program: Command): void {
5859
` ${paddedId} ${c(gradeColor, current.grade)} (${current.healthScore}) ${label}\n`,
5960
);
6061
}
62+
63+
recordEvent(buildEvent("command_complete", "history", "cli", {
64+
historyEntryCount: history.entries.length,
65+
}));
6166
});
6267
}

src/commands/lock.ts

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
import type { Command } from "commander";
22

33
import { scanForTargets } from "../discovery.js";
4+
import { buildEvent, recordEvent } from "../telemetry.js";
45
import {
56
readLockFile,
67
writeLockFile,
@@ -82,6 +83,11 @@ export function registerLockCommands(program: Command): void {
8283
process.stdout.write(
8384
`\n ${c(ANSI.green, "✓")} Locked ${entries.length} server${entries.length === 1 ? "" : "s"} to ${lockPath}\n\n`,
8485
);
86+
87+
recordEvent(buildEvent("command_complete", "lock", "cli", {
88+
lockFileExists: true,
89+
lockServerCount: entries.length,
90+
}));
8591
});
8692

8793
lockCmd
@@ -109,6 +115,7 @@ export function registerLockCommands(program: Command): void {
109115
);
110116

111117
let anyFailed = false;
118+
let totalDriftCount = 0;
112119

113120
for (const t of targets) {
114121
const lockEntry = lockMap.get(t.config.targetId);
@@ -129,6 +136,7 @@ export function registerLockCommands(program: Command): void {
129136
process.stdout.write(` ${c(ANSI.green, "✓")} ${t.config.targetId}\n`);
130137
} else {
131138
anyFailed = true;
139+
totalDriftCount += result.drift.length;
132140
process.stdout.write(` ${c(ANSI.red, "✗")} ${t.config.targetId}\n`);
133141
for (const d of result.drift) {
134142
process.stdout.write(
@@ -145,6 +153,13 @@ export function registerLockCommands(program: Command): void {
145153

146154
process.stdout.write("\n");
147155

156+
recordEvent(buildEvent("command_complete", "lock-verify", "cli", {
157+
lockFileExists: true,
158+
lockServerCount: lock.servers.length,
159+
lockDriftDetected: anyFailed,
160+
lockDriftCount: totalDriftCount,
161+
}));
162+
148163
if (anyFailed) {
149164
process.exitCode = 1;
150165
}

src/commands/scan.ts

Lines changed: 20 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,13 @@ import {
77
} from "../index.js";
88
import { appendHistory, buildHistoryEntry } from "../history.js";
99
import { buildEvent, recordEvent } from "../telemetry.js";
10+
import type { RunArtifact } from "../types.js";
1011
import { TOOL_VERSION } from "../version.js";
1112
import { ANSI, LOGO, c, useColor } from "./helpers.js";
1213

1314
// ── Scan implementation ─────────────────────────────────────────────────────
1415

15-
async function runScan(bin: string, configPath: string | undefined, invokeTools: boolean, securityCheck?: boolean): Promise<void> {
16+
async function runScan(bin: string, configPath: string | undefined, invokeTools: boolean, securityCheck?: boolean, format?: string): Promise<void> {
1617
const t0 = Date.now();
1718
process.stdout.write(useColor() ? c(ANSI.cyan, LOGO) + ` ${c(ANSI.dim, `v${TOOL_VERSION}`)}\n\n` : LOGO + ` v${TOOL_VERSION}\n\n`);
1819

@@ -53,6 +54,7 @@ async function runScan(bin: string, configPath: string | undefined, invokeTools:
5354
}
5455

5556
const results: ScanRow[] = [];
57+
const artifacts: RunArtifact[] = [];
5658
const checkStatusMap: Record<string, string> = {};
5759
let passCount = 0;
5860
let failCount = 0;
@@ -64,6 +66,7 @@ async function runScan(bin: string, configPath: string | undefined, invokeTools:
6466
process.stdout.write(` ${c(ANSI.dim, "⟳")} Checking ${c(ANSI.bold, t.config.targetId)}...`);
6567
try {
6668
const artifact = await runTarget(t.config, { invokeTools, securityCheck });
69+
artifacts.push(artifact);
6770
const toolsCheck = artifact.checks.find((ch) => ch.id === "tools");
6871
const promptsCheck = artifact.checks.find((ch) => ch.id === "prompts");
6972
const resourcesCheck = artifact.checks.find((ch) => ch.id === "resources");
@@ -164,6 +167,12 @@ async function runScan(bin: string, configPath: string | undefined, invokeTools:
164167
}
165168
process.stdout.write("\n");
166169

170+
if (format === "pr-comment-matrix" && artifacts.length > 0) {
171+
const { renderMatrixComment } = await import("../reporters/pr-comment-matrix.js");
172+
const rows = artifacts.map(a => ({ artifact: a }));
173+
process.stdout.write(renderMatrixComment(rows) + "\n");
174+
}
175+
167176
recordEvent(buildEvent("command_complete", "scan", "cli", {
168177
serversScanned: results.length,
169178
toolsFound: totalTools,
@@ -178,6 +187,9 @@ async function runScan(bin: string, configPath: string | undefined, invokeTools:
178187
t.config.adapter === "http" ? (t.config as { url: string }).url : `${(t.config as { command: string }).command} ${t.config.args.join(" ")}`,
179188
),
180189
checkStatuses: checkStatusMap,
190+
matrixServerCount: results.length,
191+
matrixPassCount: passCount,
192+
matrixFailCount: failCount,
181193
}));
182194

183195
if (failCount > 0) {
@@ -193,11 +205,12 @@ export function registerScanCommands(program: Command, bin: string): void {
193205
.description("Check all MCP servers in your Claude configs.")
194206
.option("--config <path>", "Path to a specific MCP config file.")
195207
.option("--security", "Run deep security scan (credential patterns, response analysis). Lightweight security is always included.")
208+
.option("--format <format>", "Output format: terminal or pr-comment-matrix.", "terminal")
196209
.option("--no-color", "Disable colored output.");
197210

198211
// `scan` with no subcommand — basic scan
199-
scanCmd.action(async (options: { config?: string; security?: boolean }) => {
200-
await runScan(bin, options.config, false, options.security);
212+
scanCmd.action(async (options: { config?: string; security?: boolean; format: string }) => {
213+
await runScan(bin, options.config, false, options.security, options.format);
201214
});
202215

203216
// `scan deep` — scan + invoke tools
@@ -206,10 +219,12 @@ export function registerScanCommands(program: Command, bin: string): void {
206219
.description("Scan and also invoke safe tools to verify they execute.")
207220
.option("--config <path>", "Path to a specific MCP config file.")
208221
.option("--security", "Run deep security scan (credential patterns, response analysis). Lightweight security is always included.")
209-
.action(async (options: { config?: string; security?: boolean }) => {
222+
.option("--format <format>", "Output format: terminal or pr-comment-matrix.", "terminal")
223+
.action(async (options: { config?: string; security?: boolean; format: string }) => {
210224
// Inherit parent config option if set
211225
const parentConfig = scanCmd.opts().config as string | undefined;
212226
const parentSecurity = scanCmd.opts().security as boolean | undefined;
213-
await runScan(bin, options.config ?? parentConfig, true, options.security ?? parentSecurity ?? true);
227+
const parentFormat = scanCmd.opts().format as string;
228+
await runScan(bin, options.config ?? parentConfig, true, options.security ?? parentSecurity ?? true, options.format ?? parentFormat);
214229
});
215230
}

src/score.ts

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
// IMPORTANT: Scoring logic is duplicated in api/src/worker.ts for the Cloudflare Worker
2+
// deployment (which can't import from src/). Keep both files in sync when making changes.
3+
14
import type { CheckResult, HealthGrade, HealthScore, PerformanceMetrics, ScoreDimension } from "./types.js";
25

36
export interface ScoreWeights {
@@ -9,11 +12,11 @@ export interface ScoreWeights {
912
}
1013

1114
export const DEFAULT_WEIGHTS: ScoreWeights = {
12-
protocolCompliance: 0.30,
13-
schemaQuality: 0.20,
14-
security: 0.20,
15-
reliability: 0.20,
16-
performance: 0.10,
15+
protocolCompliance: 0.30, // Highest — spec compliance is foundational for interop
16+
schemaQuality: 0.20, // Good schemas enable AI agents to use tools correctly
17+
security: 0.20, // Parity with quality — both critical for production use
18+
reliability: 0.20, // Tools/prompts/resources actually responding as expected
19+
performance: 0.10, // Lowest — latency matters less than correctness
1720
};
1821

1922
const STATUS_SCORES: Record<string, number> = {
@@ -80,6 +83,8 @@ function scorePerformance(
8083
const p95Index = Math.min(Math.ceil(sorted.length * 0.95) - 1, sorted.length - 1);
8184
const p95 = sorted[p95Index] ?? 0;
8285

86+
// p95 latency thresholds for performance scoring
87+
// <500ms = excellent (100), <1s = good (80), <2s = acceptable (60), <5s = slow (40), >5s = poor (20)
8388
let score: number;
8489
if (p95 < 500) score = 100;
8590
else if (p95 < 1000) score = 80;

0 commit comments

Comments
 (0)