Conversation
Add `mulmo-slide url <url>` command that fetches web articles via Puppeteer, checks content quality with LLM, and generates narrated MulmoScript with configurable expression styles (author/news/overview). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdds a URL-to-MulmoScript feature: CLI command and action to fetch web articles, validate content, generate MulmoScript with retries and style application, new expression-style and slide-schema utilities, README/AGENTS docs, npm script, and tests for styles and basename generation. Changes
sequenceDiagram
participant CLI as CLI User
participant URLCmd as URL Command
participant URLScript as URL-to-Script Action
participant Puppeteer as Article Fetcher
participant OpenAI as LLM
participant Validator as Zod Validator
participant FS as File System
CLI->>URLCmd: mulmo-slide url <URL> -e author -l ja
URLCmd->>URLScript: runUrlToScript(url, options)
URLScript->>Puppeteer: fetchArticle(url)
Puppeteer-->>URLScript: {title, textContent, dateStr}
URLScript->>FS: check existing / save article.txt
URLScript->>OpenAI: checkArticleQuality(text)
OpenAI-->>URLScript: quality verdict
URLScript->>OpenAI: generateBeatsWithRetry(prompt)
OpenAI-->>URLScript: MulmoScript JSON
URLScript->>Validator: validate JSON (Zod)
Validator-->>URLScript: valid / errors
URLScript->>URLScript: applySlideStyle(beats, style)
URLScript->>FS: save MulmoScript
URLScript-->>URLCmd: return scriptPath
URLCmd->>FS: optionally run movie/bundle generators
URLCmd-->>CLI: completion message
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/actions/url-to-script.ts (2)
64-96: Quality check result is not validated beyond JSON parsing.Line 89 parses the LLM response as
{ sufficient: boolean; reason: string }but doesn't verify the shape. If the LLM returns valid JSON without asufficientfield,result.sufficientwould beundefined(falsy), causing a false rejection with anundefinedreason. Consider adding a guard.Proposed fix
const content = extractResponseContent(response); const result = JSON.parse(content) as { sufficient: boolean; reason: string }; - if (!result.sufficient) { + if (typeof result.sufficient !== "boolean") { + console.warn(" ⚠ Quality check returned unexpected format, proceeding..."); + return; + } + if (!result.sufficient) { throw new Error(`Article content insufficient: ${result.reason}`); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/actions/url-to-script.ts` around lines 64 - 96, In checkArticleQuality, after calling extractResponseContent and JSON.parse, validate the parsed object shape (from the parsed result returned by extractResponseContent/response) by ensuring result has a boolean property "sufficient" and a string "reason" before using them; if "sufficient" is missing or not a boolean or "reason" is missing/not a string, throw a clear Error that includes the raw content (or the parsed object) so we don't reject articles due to malformed LLM output, and use strict check (result.sufficient === true) when deciding pass/fail.
44-62: Fragile type assertions on the agent function call and result.The
as unknown as AgentFunctionContextcast (Line 54) and the result destructuring(result as { data?: ArticleData }).data(Line 56) bypass type safety. IfpuppeteerCrawlerAgent's API changes, these will silently break at runtime.This is acceptable for now given the agent function's complex signature, but consider adding a runtime shape check on the result (e.g., verify
datahasurl,title,textContentfields) to fail fast with a clear error.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/actions/url-to-script.ts` around lines 44 - 62, fetchArticle currently uses fragile casts for the agent call and result; add a runtime shape validation after receiving result from puppeteerCrawlerAgent.agent to ensure the returned object has a data property with url, title and textContent (non-empty strings) and throw a clear Error if validation fails; keep using agentFn = puppeteerCrawlerAgent.agent but replace the unchecked cast and the simple (result as { data?: ArticleData }).data access with a small inline type-guard/validation that asserts result && typeof result === 'object' && result.data && typeof result.data.textContent === 'string' (and validate url/title similarly) before returning data, so fetchArticle fails fast with a clear message if the agent shape changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/actions/url-to-script.ts`:
- Around line 98-122: buildBeatsPrompt currently appends the entire articleText
into the LLM prompt (risking context-window overrun); change buildBeatsPrompt to
limit the article content included by either truncating articleText to a safe
max length (e.g., a configurable maxChars ~30000 or reusing the existing
3000-char quality truncation) or by implementing deterministic
chunking/summarization before beat generation and then generating/merging beats
per chunk; update any calling code that expects full-text output to handle
merged beat results and expose the maxChars as a configurable constant or
UrlToScriptOptions field so tests can control it.
- Around line 28-31: generateBasename can produce a trailing hyphen when title
is non-null but sanitizes to an empty string (e.g., "!!!"); update
generateBasename to call sanitizeBasename(title) and if the result is empty
(after slicing with TITLE_MAX_LENGTH) fall back to "article" before constructing
the `${dateStr}-${sanitizedTitle}` return value; reference function
generateBasename and sanitizeBasename and ensure TITLE_MAX_LENGTH slicing still
applies when choosing the fallback.
---
Nitpick comments:
In `@src/actions/url-to-script.ts`:
- Around line 64-96: In checkArticleQuality, after calling
extractResponseContent and JSON.parse, validate the parsed object shape (from
the parsed result returned by extractResponseContent/response) by ensuring
result has a boolean property "sufficient" and a string "reason" before using
them; if "sufficient" is missing or not a boolean or "reason" is missing/not a
string, throw a clear Error that includes the raw content (or the parsed object)
so we don't reject articles due to malformed LLM output, and use strict check
(result.sufficient === true) when deciding pass/fail.
- Around line 44-62: fetchArticle currently uses fragile casts for the agent
call and result; add a runtime shape validation after receiving result from
puppeteerCrawlerAgent.agent to ensure the returned object has a data property
with url, title and textContent (non-empty strings) and throw a clear Error if
validation fails; keep using agentFn = puppeteerCrawlerAgent.agent but replace
the unchecked cast and the simple (result as { data?: ArticleData }).data access
with a small inline type-guard/validation that asserts result && typeof result
=== 'object' && result.data && typeof result.data.textContent === 'string' (and
validate url/title similarly) before returning data, so fetchArticle fails fast
with a clear message if the agent shape changes.
| export function generateBasename(title: string | null, dateStr: string): string { | ||
| const sanitizedTitle = title ? sanitizeBasename(title).slice(0, TITLE_MAX_LENGTH) : "article"; | ||
| return `${dateStr}-${sanitizedTitle}`; | ||
| } |
There was a problem hiding this comment.
Edge case: titles with only special characters produce a trailing-hyphen basename.
If title is non-null but contains only special characters (e.g., "!!!"), sanitizeBasename returns "", resulting in a basename like "20260217-". Consider falling back to "article" when the sanitized title is empty.
Proposed fix
export function generateBasename(title: string | null, dateStr: string): string {
- const sanitizedTitle = title ? sanitizeBasename(title).slice(0, TITLE_MAX_LENGTH) : "article";
+ const sanitizedTitle = (title ? sanitizeBasename(title).slice(0, TITLE_MAX_LENGTH) : "") || "article";
return `${dateStr}-${sanitizedTitle}`;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export function generateBasename(title: string | null, dateStr: string): string { | |
| const sanitizedTitle = title ? sanitizeBasename(title).slice(0, TITLE_MAX_LENGTH) : "article"; | |
| return `${dateStr}-${sanitizedTitle}`; | |
| } | |
| export function generateBasename(title: string | null, dateStr: string): string { | |
| const sanitizedTitle = (title ? sanitizeBasename(title).slice(0, TITLE_MAX_LENGTH) : "") || "article"; | |
| return `${dateStr}-${sanitizedTitle}`; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/actions/url-to-script.ts` around lines 28 - 31, generateBasename can
produce a trailing hyphen when title is non-null but sanitizes to an empty
string (e.g., "!!!"); update generateBasename to call sanitizeBasename(title)
and if the result is empty (after slicing with TITLE_MAX_LENGTH) fall back to
"article" before constructing the `${dateStr}-${sanitizedTitle}` return value;
reference function generateBasename and sanitizeBasename and ensure
TITLE_MAX_LENGTH slicing still applies when choosing the fallback.
| function buildBeatsPrompt(articleText: string, options: UrlToScriptOptions): string { | ||
| const styleName = options.expression ?? "author"; | ||
| const style = expressionStyles[styleName]; | ||
| if (!style) { | ||
| throw new Error( | ||
| `Unknown expression style: ${styleName}. Available: ${EXPRESSION_NAMES.join(", ")}` | ||
| ); | ||
| } | ||
|
|
||
| const lang = options.lang ?? "en"; | ||
| const languageName = getLanguageName(lang); | ||
|
|
||
| const beatsHint = options.beats ? `\nTarget approximately ${options.beats} beats.` : ""; | ||
|
|
||
| return `${style.systemPrompt} | ||
|
|
||
| ## Language | ||
|
|
||
| Write all narration text in ${languageName} (lang code: "${lang}"). | ||
| ${beatsHint} | ||
|
|
||
| ## Article Content | ||
|
|
||
| ${articleText}`; | ||
| } |
There was a problem hiding this comment.
No truncation of article text before sending to the LLM for beat generation.
buildBeatsPrompt appends the full articleText to the prompt (Line 121). For long articles, this could exceed the model's context window and cause API errors or silent truncation. The quality check (Line 83) truncates to 3000 chars, but the beat generation does not.
Consider truncating or chunking the article text to stay within safe token limits (e.g., ~30,000 characters for gpt-4o's context).
Proposed fix
+const ARTICLE_MAX_CHARS = 30000;
+
function buildBeatsPrompt(articleText: string, options: UrlToScriptOptions): string {
const styleName = options.expression ?? "author";
const style = expressionStyles[styleName];
if (!style) {
throw new Error(
`Unknown expression style: ${styleName}. Available: ${EXPRESSION_NAMES.join(", ")}`
);
}
const lang = options.lang ?? "en";
const languageName = getLanguageName(lang);
const beatsHint = options.beats ? `\nTarget approximately ${options.beats} beats.` : "";
+ const truncatedArticle = articleText.slice(0, ARTICLE_MAX_CHARS);
+
return `${style.systemPrompt}
## Language
Write all narration text in ${languageName} (lang code: "${lang}").
${beatsHint}
## Article Content
-${articleText}`;
+${truncatedArticle}`;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function buildBeatsPrompt(articleText: string, options: UrlToScriptOptions): string { | |
| const styleName = options.expression ?? "author"; | |
| const style = expressionStyles[styleName]; | |
| if (!style) { | |
| throw new Error( | |
| `Unknown expression style: ${styleName}. Available: ${EXPRESSION_NAMES.join(", ")}` | |
| ); | |
| } | |
| const lang = options.lang ?? "en"; | |
| const languageName = getLanguageName(lang); | |
| const beatsHint = options.beats ? `\nTarget approximately ${options.beats} beats.` : ""; | |
| return `${style.systemPrompt} | |
| ## Language | |
| Write all narration text in ${languageName} (lang code: "${lang}"). | |
| ${beatsHint} | |
| ## Article Content | |
| ${articleText}`; | |
| } | |
| const ARTICLE_MAX_CHARS = 30000; | |
| function buildBeatsPrompt(articleText: string, options: UrlToScriptOptions): string { | |
| const styleName = options.expression ?? "author"; | |
| const style = expressionStyles[styleName]; | |
| if (!style) { | |
| throw new Error( | |
| `Unknown expression style: ${styleName}. Available: ${EXPRESSION_NAMES.join(", ")}` | |
| ); | |
| } | |
| const lang = options.lang ?? "en"; | |
| const languageName = getLanguageName(lang); | |
| const beatsHint = options.beats ? `\nTarget approximately ${options.beats} beats.` : ""; | |
| const truncatedArticle = articleText.slice(0, ARTICLE_MAX_CHARS); | |
| return `${style.systemPrompt} | |
| ## Language | |
| Write all narration text in ${languageName} (lang code: "${lang}"). | |
| ${beatsHint} | |
| ## Article Content | |
| ${truncatedArticle}`; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/actions/url-to-script.ts` around lines 98 - 122, buildBeatsPrompt
currently appends the entire articleText into the LLM prompt (risking
context-window overrun); change buildBeatsPrompt to limit the article content
included by either truncating articleText to a safe max length (e.g., a
configurable maxChars ~30000 or reusing the existing 3000-char quality
truncation) or by implementing deterministic chunking/summarization before beat
generation and then generating/merging beats per chunk; update any calling code
that expects full-text output to handle merged beat results and expose the
maxChars as a configurable constant or UrlToScriptOptions field so tests can
control it.
- Remove "meta" from beat instructions (MulmoScript doesn't have it) - Move imagePrompt to beat-level field instead of inside image object - Prefer markdown slides, use imagePrompt sparingly - Update test assertion to match corrected prompt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/utils/expression-styles.ts (1)
12-23: Example JSON only demonstrates theimagevariant; consider adding animagePromptbeat example.LLMs tend to mimic the structure of provided examples more faithfully than prose instructions. Since
imagePromptis a top-level beat field (not nested insideimage), showing it once in the example would reduce the chance of the LLM placing it incorrectly or ignoring it entirely.💡 Suggested addition to the example JSON
"beats": [ { "text": "<narration text for this beat>", "image": { "type": "markdown", "markdown": ["# Slide Title", "- point 1", "- point 2"] } - } + }, + { + "text": "<narration text for this beat>", + "imagePrompt": "descriptive prompt for image generation" + } ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/expression-styles.ts` around lines 12 - 23, The example JSON in expression-styles.ts only shows the "image" variant inside the beats array, which encourages LLMs to omit the top-level beat field "imagePrompt"; add a second example beat inside "beats" demonstrating the top-level "imagePrompt" field (e.g., a beat object with "text" and "imagePrompt") so examples include both "image" and "imagePrompt" variants and clarify their placement relative to the "beats" array and "image" object.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/utils/expression-styles.ts`:
- Around line 12-23: The example JSON in expression-styles.ts only shows the
"image" variant inside the beats array, which encourages LLMs to omit the
top-level beat field "imagePrompt"; add a second example beat inside "beats"
demonstrating the top-level "imagePrompt" field (e.g., a beat object with "text"
and "imagePrompt") so examples include both "image" and "imagePrompt" variants
and clarify their placement relative to the "beats" array and "image" object.
Replace image.type="markdown" with image.type="slide" in the URL-to-MulmoScript pipeline. The LLM now generates structured slide layouts (title, columns, comparison, grid, stats, timeline, etc.) instead of raw markdown arrays. - Add slide-schema.ts: converts Zod slide schemas to JSON Schema for LLM prompts - Update expression-styles.ts: slide DSL output format with layout selection guide - Update url-to-script.ts: inject schema into prompt, use slideStyles + mergeScripts - Update --style to accept slide presets (slide_dark, slide_pop, etc.) - Bump mulmocast to ^2.2.1 for slideStyles/mergeScripts exports Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Without a slide style, the generated MulmoScript has no slideParams.theme, causing mulmocast to fail with "Slide theme is required". Now defaults to slide_dark and also patches existing cached scripts missing slideParams. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/test_slide_schema.ts (1)
38-42: Caching test verifies value equality, not actual cache hit.
assert.strictEqualon JavaScript string primitives uses===, which compares by value — a rebuilt string with identical content would also pass. The comment "same cached reference" overstates what's being verified. The test is functionally fine (confirms deterministic output), but won't catch ifcachedSchemais bypassed and the schema is rebuilt on each call.To genuinely test cache behavior, consider spying on the internal build functions or counting call invocations.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_slide_schema.ts` around lines 38 - 42, The test currently uses assert.strictEqual on string primitives from getSlideSchemaForPrompt which only verifies value equality, not that cachedSchema was actually reused; update the test to assert a real cache hit by either (a) spying/stubbing the internal builder that constructs the schema (e.g., the function used by getSlideSchemaForPrompt) and asserting it is called only once across two calls, or (b) expose cachedSchema or change getSlideSchemaForPrompt to return a non-primitive (object) so you can assert reference identity (strictEqual) reliably; locate references to getSlideSchemaForPrompt and the internal builder/cachedSchema in the module to implement the spy or exposure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_expression_styles.ts`:
- Around line 85-93: Add a test for the edge case where the title sanitizes to
an empty string: call generateBasename with a title made only of special
characters (e.g., "!!!" or "()()!") and the date "20260217" and assert that the
returned basename does not end with a trailing dash but instead falls back to
the default slug (e.g., "20260217-article"); locate tests near the existing
generateBasename and sanitizeBasename usages in tests/test_expression_styles.ts
and add an assertion like assert.strictEqual(result, "20260217-article") to
ensure generateBasename handles empty sanitized titles.
---
Nitpick comments:
In `@tests/test_slide_schema.ts`:
- Around line 38-42: The test currently uses assert.strictEqual on string
primitives from getSlideSchemaForPrompt which only verifies value equality, not
that cachedSchema was actually reused; update the test to assert a real cache
hit by either (a) spying/stubbing the internal builder that constructs the
schema (e.g., the function used by getSlideSchemaForPrompt) and asserting it is
called only once across two calls, or (b) expose cachedSchema or change
getSlideSchemaForPrompt to return a non-primitive (object) so you can assert
reference identity (strictEqual) reliably; locate references to
getSlideSchemaForPrompt and the internal builder/cachedSchema in the module to
implement the spy or exposure.
| test("generateBasename: handles empty string title", () => { | ||
| const result = generateBasename("", "20260217"); | ||
| assert.strictEqual(result, "20260217-article"); | ||
| }); | ||
|
|
||
| test("generateBasename: handles Japanese title", () => { | ||
| const result = generateBasename("日本語のタイトル", "20260217"); | ||
| assert.ok(result.startsWith("20260217-")); | ||
| }); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find the implementation files
find . -type f -name "*.ts" -o -name "*.js" | grep -E "(url-to-script|test_expression)" | head -20Repository: receptron/MulmoCast-Slides
Length of output: 134
🏁 Script executed:
# Look for generateBasename and sanitizeBasename implementations
rg -n "generateBasename|sanitizeBasename" --type ts --type js -B 2 -A 10Repository: receptron/MulmoCast-Slides
Length of output: 11863
🏁 Script executed:
# Examine the test file directly
cat -n tests/test_expression_styles.tsRepository: receptron/MulmoCast-Slides
Length of output: 4448
Missing edge-case test for titles that sanitize to empty strings.
The Japanese title test at lines 90–93 uses a weak assertion that only checks result.startsWith("20260217-"), leaving it unable to catch if the title part becomes empty after sanitization. While Japanese characters are actually preserved by sanitizeBasename (so the test works correctly), a more serious gap exists: titles containing only special characters (e.g., "!!!", "()()!") would sanitize to empty strings, producing basenames like "20260217-" with trailing dashes—which are invalid filenames. No test currently covers this case.
💡 Suggested test addition
+test("generateBasename: handles title with only special characters", () => {
+ const result = generateBasename("!!!", "20260217");
+ // Should not produce trailing dash
+ assert.ok(!result.endsWith("-"), "result should not end with a trailing dash");
+ assert.notStrictEqual(result, "20260217-", "should not be date-dash only");
+});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| test("generateBasename: handles empty string title", () => { | |
| const result = generateBasename("", "20260217"); | |
| assert.strictEqual(result, "20260217-article"); | |
| }); | |
| test("generateBasename: handles Japanese title", () => { | |
| const result = generateBasename("日本語のタイトル", "20260217"); | |
| assert.ok(result.startsWith("20260217-")); | |
| }); | |
| test("generateBasename: handles empty string title", () => { | |
| const result = generateBasename("", "20260217"); | |
| assert.strictEqual(result, "20260217-article"); | |
| }); | |
| test("generateBasename: handles Japanese title", () => { | |
| const result = generateBasename("日本語のタイトル", "20260217"); | |
| assert.ok(result.startsWith("20260217-")); | |
| }); | |
| test("generateBasename: handles title with only special characters", () => { | |
| const result = generateBasename("!!!", "20260217"); | |
| // Should not produce trailing dash | |
| assert.ok(!result.endsWith("-"), "result should not end with a trailing dash"); | |
| assert.notStrictEqual(result, "20260217-", "should not be date-dash only"); | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_expression_styles.ts` around lines 85 - 93, Add a test for the
edge case where the title sanitizes to an empty string: call generateBasename
with a title made only of special characters (e.g., "!!!" or "()()!") and the
date "20260217" and assert that the returned basename does not end with a
trailing dash but instead falls back to the default slug (e.g.,
"20260217-article"); locate tests near the existing generateBasename and
sanitizeBasename usages in tests/test_expression_styles.ts and add an assertion
like assert.strictEqual(result, "20260217-article") to ensure generateBasename
handles empty sanitized titles.
Summary
mulmo-slide url <url>command that generates MulmoScript from web article URLspuppeteerCrawlerAgentfrom mulmocastauthor(first-person),news(objective anchor),overview(concise summary)--movieand--bundleflags to continue pipeline after MulmoScript generationNew Files
src/utils/expression-styles.tssrc/actions/url-to-script.tssrc/cli/url-commands.tstests/test_expression_styles.tsModified Files
src/cli/index.tsregisterUrlCommandsin reducer chainpackage.json"url"script shortcutAGENTS.mdREADME.mdUsage
Test plan
yarn buildpassesyarn lintpasses (0 errors)yarn test— 291/291 tests pass (13 new)yarn url <real-url> -e author -l jagenerates valid MulmoScriptyarn url <404-url>fails with quality check error--movieand--bundleflags trigger downstream generationUser Prompt
Web 記事から MulmoScript を自動生成し、movie/bundle まで一気通貫で生成するコマンドを追加する。
mulmocast-cliからpuppeteerCrawlerAgentを import して URL のテキスト抽出を行い、独自の「表現スタイル」(system prompt) で LLM にビートを生成させる。🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Tests
Chores