From ea1c6c202af8ecefab118595911c5ed0b108f96d Mon Sep 17 00:00:00 2001 From: Sidney Swift <158200036+sidneyswift@users.noreply.github.com> Date: Thu, 2 Apr 2026 02:23:07 -0400 Subject: [PATCH 01/13] feat: add content creation skill for composing primitives Teaches LLM agents how to use the modular content creation primitives (image, video, text, audio, render, upscale) via CLI. Includes step-by-step and quick full-pipeline workflows, template selection guidance, and iteration strategies. Made-with: Cursor --- content-creation/SKILL.md | 102 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 102 insertions(+) create mode 100644 content-creation/SKILL.md diff --git a/content-creation/SKILL.md b/content-creation/SKILL.md new file mode 100644 index 0000000..095aa93 --- /dev/null +++ b/content-creation/SKILL.md @@ -0,0 +1,102 @@ +--- +name: content-creation +description: Compose content creation primitives to produce social videos for artists. Use when asked to create content, make a video, generate an image, or produce social media posts for an artist. +--- + +# Content Creation Skill + +Create social-ready videos by composing independent primitives via CLI. Each primitive does one thing and can be run alone or chained together. + +## Primitives + +| Command | What it does | Returns | +|---------|-------------|---------| +| `recoup content audio --artist ` | Pick a song, transcribe, find the best 15s clip | `runId` → poll for `{ songTitle, songUrl, startSeconds, clipLyrics, clipMood }` | +| `recoup content image --artist --template ` | Generate an AI image from face guide + template scene | `runId` → poll for `{ imageUrl }` | +| `recoup content video --image ` | Animate the image into a video | `runId` → poll for `{ videoUrl }` | +| `recoup content text --artist --song ` | Generate on-screen text based on the song | `{ content, font, color }` (synchronous) | +| `recoup content render --video --audio --text ` | Combine video + audio + text into final mp4 | `runId` → poll for `{ videoUrl, sizeBytes }` | +| `recoup content upscale --url --type image` | Upscale an image or video | `runId` → poll for `{ url }` | +| `recoup content create --artist ` | Run the full pipeline (all steps in one shot) | `runId` → poll for final video | + +## Workflow: Step-by-Step Content Creation + +Use this when you want creative control over each step. + +``` +1. Select audio + recoup content audio --artist --song --json + → Wait for completion, get songTitle, songUrl, startSeconds, clipLyrics + +2. Generate image + recoup content image --artist --template artist-caption-bedroom --json + → Wait for completion, get imageUrl + +3. (Optional) Upscale image + recoup content upscale --url --type image --json + → Wait for completion, get upscaled URL + +4. Generate video + recoup content video --image --json + → Wait for completion, get videoUrl + +5. Generate text + recoup content text --artist --song "" --length short --json + → Synchronous, get { content } + +6. Render final video + recoup content render --video --audio --start --duration 15 --text "" --json + → Wait for completion, get final videoUrl +``` + +## Workflow: Quick Full Pipeline + +Use this when you just need a video without creative decisions. + +``` +recoup content create --artist --template artist-caption-bedroom --json +``` + +## Templates + +| Template | Scene | Best for | +|----------|-------|----------| +| `artist-caption-bedroom` | Moody purple bedroom selfie | Introspective, lo-fi vibe | +| `artist-caption-outside` | Night street scene | Urban, cinematic feel | +| `artist-caption-stage` | Small venue concert | Performance, energy | +| `album-record-store` | Vinyl on display in record store | Album/single promotion | + +## How to Choose a Template + +- Look at the artist's `context/artist.md` for their aesthetic direction +- Match the template to the song's mood (use `clipMood` from the audio step) +- Bedroom = introspective/emotional, Outside = cinematic/urban, Stage = energetic + +## Iteration + +The power of primitives is iteration. If a step produces bad results: + +- **Bad image?** Run `recoup content image` again — it picks a different reference image each time +- **Wrong song clip?** Run `recoup content audio` again — the LLM may pick a different moment +- **Text doesn't fit?** Run `recoup content text` with a different `--length` +- **Want higher quality?** Run `recoup content upscale` on the image or video before rendering + +## Lipsync Mode + +For lip-synced videos where the artist's mouth moves to the song: + +``` +recoup content video --image --lipsync --song-url --start --duration 15 +``` + +The video will have audio baked in. When rendering, pass `--has-audio` so ffmpeg doesn't overlay a second audio track. + +## Polling for Results + +Async primitives return a `runId`. Check progress with: + +``` +recoup tasks status --run --json +``` + +Poll until `status` is `COMPLETED`, then read the `output` field. From d699466b2dd42fde77939336e2990a2e162b422c Mon Sep 17 00:00:00 2001 From: Sidney Swift <158200036+sidneyswift@users.noreply.github.com> Date: Thu, 2 Apr 2026 02:25:42 -0400 Subject: [PATCH 02/13] improve: rewrite content-creation skill with creative decision-making - Pushy description with comprehensive trigger phrases - Creative decision-making guidance (template selection, song choice, quality evaluation) instead of just procedural steps - Explains WHY (e.g. start with audio because mood informs template) - Edge case handling (instrumentals, missing face guide, song URLs) - Iteration strategies showing cost savings of modular approach Made-with: Cursor --- content-creation/SKILL.md | 159 ++++++++++++++++++++++---------------- 1 file changed, 92 insertions(+), 67 deletions(-) diff --git a/content-creation/SKILL.md b/content-creation/SKILL.md index 095aa93..18d2018 100644 --- a/content-creation/SKILL.md +++ b/content-creation/SKILL.md @@ -1,102 +1,127 @@ --- name: content-creation -description: Compose content creation primitives to produce social videos for artists. Use when asked to create content, make a video, generate an image, or produce social media posts for an artist. +description: Create social videos, TikToks, Reels, and visual content for music artists using AI-generated images, video, on-screen text, and song audio. Use this skill whenever the user asks to create content, make a video, generate an image, produce a TikTok or Reel, create a promotional clip, add on-screen text or captions to a video, do a face swap, make visual content for an artist, post to social media, or create short-form video from a song. Also use when the user wants to iterate on content quality — regenerate images, try different songs, adjust text, or upscale for higher quality. --- -# Content Creation Skill +# Content Creation -Create social-ready videos by composing independent primitives via CLI. Each primitive does one thing and can be run alone or chained together. +Create social-ready short-form videos for music artists by composing independent primitives. Each primitive does one thing. You orchestrate them based on the artist's identity and the song's energy. -## Primitives +## How It Works -| Command | What it does | Returns | -|---------|-------------|---------| -| `recoup content audio --artist ` | Pick a song, transcribe, find the best 15s clip | `runId` → poll for `{ songTitle, songUrl, startSeconds, clipLyrics, clipMood }` | -| `recoup content image --artist --template ` | Generate an AI image from face guide + template scene | `runId` → poll for `{ imageUrl }` | -| `recoup content video --image ` | Animate the image into a video | `runId` → poll for `{ videoUrl }` | -| `recoup content text --artist --song ` | Generate on-screen text based on the song | `{ content, font, color }` (synchronous) | -| `recoup content render --video --audio --text ` | Combine video + audio + text into final mp4 | `runId` → poll for `{ videoUrl, sizeBytes }` | -| `recoup content upscale --url --type image` | Upscale an image or video | `runId` → poll for `{ url }` | -| `recoup content create --artist ` | Run the full pipeline (all steps in one shot) | `runId` → poll for final video | +Six primitives, each callable independently: -## Workflow: Step-by-Step Content Creation +| Primitive | Command | Async? | +|-----------|---------|--------| +| Select audio | `recoup content audio` | Yes (30-60s) | +| Generate image | `recoup content image` | Yes (10-30s) | +| Generate video | `recoup content video` | Yes (60-180s) | +| Generate text | `recoup content text` | No (2-5s) | +| Render final | `recoup content render` | Yes (10-30s) | +| Upscale | `recoup content upscale` | Yes (30-60s) | -Use this when you want creative control over each step. +Async primitives return a `runId`. Poll with `recoup tasks status --run --json` until `status` is `COMPLETED`, then read `output`. -``` -1. Select audio - recoup content audio --artist --song --json - → Wait for completion, get songTitle, songUrl, startSeconds, clipLyrics +There is also `recoup content create` which runs all steps in one shot — use it when the user just wants a video without creative control. -2. Generate image - recoup content image --artist --template artist-caption-bedroom --json - → Wait for completion, get imageUrl +## Creative Decision-Making -3. (Optional) Upscale image - recoup content upscale --url --type image --json - → Wait for completion, get upscaled URL +The skill's value is not in running commands — it is in making creative decisions at each step. Read the artist's `context/artist.md` before starting. It contains their personality, aesthetic direction, mood, colors, and sacred rules. -4. Generate video - recoup content video --image --json - → Wait for completion, get videoUrl +### Choosing a template -5. Generate text - recoup content text --artist --song "" --length short --json - → Synchronous, get { content } +Templates control the visual scene (lighting, camera, environment). Match the template to the artist's aesthetic and the song's mood: -6. Render final video - recoup content render --video --audio --start --duration 15 --text "" --json - → Wait for completion, get final videoUrl -``` +| Template | Scene | When to use | +|----------|-------|-------------| +| `artist-caption-bedroom` | Moody purple bedroom, deadpan selfie | Introspective songs, vulnerable lyrics, lo-fi vibe | +| `artist-caption-outside` | Night street, phone on ground | Urban feel, cinematic energy, confident tracks | +| `artist-caption-stage` | Small venue, fan cam angle | Performance energy, hype songs, live feel | +| `album-record-store` | Vinyl on display in NYC record store | Album/single promotion, release day content | -## Workflow: Quick Full Pipeline +If `artist.md` says "dark, moody, introspective" — use bedroom. If it says "cinematic, urban" — use outside. When in doubt, bedroom is the safest default for most emerging artists. -Use this when you just need a video without creative decisions. +### Choosing a song clip -``` -recoup content create --artist --template artist-caption-bedroom --json -``` +Start with audio before image because the song's mood should influence your template choice. If the artist has multiple songs, pick one that matches the content goal: +- Promoting a new release? Use `--song ` to target it +- General content? Let the pipeline pick randomly +- Have a specific audio file? Pass the URL: `--song https://...` -## Templates +After audio selection completes, check `clipMood` and `clipLyrics` in the output. If the mood doesn't match the template you planned, switch templates before generating the image. -| Template | Scene | Best for | -|----------|-------|----------| -| `artist-caption-bedroom` | Moody purple bedroom selfie | Introspective, lo-fi vibe | -| `artist-caption-outside` | Night street scene | Urban, cinematic feel | -| `artist-caption-stage` | Small venue concert | Performance, energy | -| `album-record-store` | Vinyl on display in record store | Album/single promotion | +### Evaluating intermediate results -## How to Choose a Template +After each step, assess quality before moving on: -- Look at the artist's `context/artist.md` for their aesthetic direction -- Match the template to the song's mood (use `clipMood` from the audio step) -- Bedroom = introspective/emotional, Outside = cinematic/urban, Stage = energetic +- **Image**: Does it match the template's aesthetic? Is the face recognizable? If not, run `recoup content image` again — it picks a different reference composition each time. +- **Video**: Is the motion natural? Lipsync clips should show mouth movement matching the lyrics. If the video is too static or glitchy, regenerate. +- **Text**: Does the text connect to the song's lyrics/theme? Is the length right for the platform? Try a different `--length` if it feels off. +- **Upscale**: Only upscale if you need higher quality (adds 30-60s per step). Skip for quick drafts. -## Iteration +### Handling edge cases -The power of primitives is iteration. If a step produces bad results: +- **Instrumental songs (no lyrics)**: Audio selection still works — Whisper returns empty lyrics. Text generation will have less to work with. Consider writing text manually via `--text "your text here"` on the render command. +- **Artist has no face-guide.png**: Image generation will fail. Ask the user to provide a face image URL with `--face-guide `. +- **Song URL instead of repo slug**: Pass URLs directly in `--song`. The pipeline downloads, transcribes, and analyzes them the same way. -- **Bad image?** Run `recoup content image` again — it picks a different reference image each time -- **Wrong song clip?** Run `recoup content audio` again — the LLM may pick a different moment -- **Text doesn't fit?** Run `recoup content text` with a different `--length` -- **Want higher quality?** Run `recoup content upscale` on the image or video before rendering +## Workflow -## Lipsync Mode +### Step-by-step (creative control) -For lip-synced videos where the artist's mouth moves to the song: +```bash +# 1. Select a song clip +recoup content audio --artist --json +# Check output: songTitle, clipLyrics, clipMood +# Use clipMood to confirm your template choice -``` -recoup content video --image --lipsync --song-url --start --duration 15 -``` +# 2. Generate image +recoup content image --artist --template --json +# Check output: imageUrl — does it match the aesthetic? -The video will have audio baked in. When rendering, pass `--has-audio` so ffmpeg doesn't overlay a second audio track. +# 3. (Optional) Upscale image +recoup content upscale --url --type image --json -## Polling for Results +# 4. Generate video +recoup content video --image --json +# Check output: videoUrl -Async primitives return a `runId`. Check progress with: +# 5. Generate on-screen text +recoup content text --artist --song "" --length short --json +# Synchronous — returns { content } immediately +# 6. Render final video +recoup content render --video --audio \ + --start --duration 15 --text "" --json +# Check output: final videoUrl ``` -recoup tasks status --run --json + +### Quick (no creative decisions) + +```bash +recoup content create --artist --template artist-caption-bedroom --json +``` + +### Lipsync (mouth moves to lyrics) + +```bash +# Audio first, then video with --lipsync +recoup content video --image --lipsync \ + --song-url --start --duration 15 + +# When rendering, pass --has-audio so ffmpeg doesn't double the audio +recoup content render --video --audio \ + --start --duration 15 --text "" --has-audio ``` -Poll until `status` is `COMPLETED`, then read the `output` field. +## Iteration + +The point of primitives is that you can redo any step without rerunning everything: + +- Bad image? Run `image` again (different reference each time) +- Wrong song moment? Run `audio` again +- Text too long? Run `text` with `--length short` +- Low quality? Run `upscale` on the image or video +- Everything good but text is wrong? Just rerun `render` with new text + +Each retry costs only that step's time, not the full 5-10 minute pipeline. From 36de75fe544239c8dd9047baecc9a0763430d0b3 Mon Sep 17 00:00:00 2001 From: Sidney Swift <158200036+sidneyswift@users.noreply.github.com> Date: Thu, 2 Apr 2026 04:38:43 -0400 Subject: [PATCH 03/13] refactor: update content-creation skill with renamed CLI commands Commands renamed to verb-qualifier pattern: generate-image, generate-video, generate-caption, transcribe-audio. Made-with: Cursor --- content-creation/SKILL.md | 35 +++++++++++++++++------------------ 1 file changed, 17 insertions(+), 18 deletions(-) diff --git a/content-creation/SKILL.md b/content-creation/SKILL.md index 18d2018..98db460 100644 --- a/content-creation/SKILL.md +++ b/content-creation/SKILL.md @@ -13,10 +13,10 @@ Six primitives, each callable independently: | Primitive | Command | Async? | |-----------|---------|--------| -| Select audio | `recoup content audio` | Yes (30-60s) | -| Generate image | `recoup content image` | Yes (10-30s) | -| Generate video | `recoup content video` | Yes (60-180s) | -| Generate text | `recoup content text` | No (2-5s) | +| Transcribe audio | `recoup content transcribe-audio` | Yes (30-60s) | +| Generate image | `recoup content generate-image` | Yes (10-30s) | +| Generate video | `recoup content generate-video` | Yes (60-180s) | +| Generate caption | `recoup content generate-caption` | No (2-5s) | | Render final | `recoup content render` | Yes (10-30s) | | Upscale | `recoup content upscale` | Yes (30-60s) | @@ -48,13 +48,13 @@ Start with audio before image because the song's mood should influence your temp - General content? Let the pipeline pick randomly - Have a specific audio file? Pass the URL: `--song https://...` -After audio selection completes, check `clipMood` and `clipLyrics` in the output. If the mood doesn't match the template you planned, switch templates before generating the image. +After transcription completes, check the lyrics and mood in the output. If the mood doesn't match the template you planned, switch templates before generating the image. ### Evaluating intermediate results After each step, assess quality before moving on: -- **Image**: Does it match the template's aesthetic? Is the face recognizable? If not, run `recoup content image` again — it picks a different reference composition each time. +- **Image**: Does it match the template's aesthetic? Is the face recognizable? If not, run `recoup content generate-image` again — it picks a different reference composition each time. - **Video**: Is the motion natural? Lipsync clips should show mouth movement matching the lyrics. If the video is too static or glitchy, regenerate. - **Text**: Does the text connect to the song's lyrics/theme? Is the length right for the platform? Try a different `--length` if it feels off. - **Upscale**: Only upscale if you need higher quality (adds 30-60s per step). Skip for quick drafts. @@ -70,24 +70,23 @@ After each step, assess quality before moving on: ### Step-by-step (creative control) ```bash -# 1. Select a song clip -recoup content audio --artist --json -# Check output: songTitle, clipLyrics, clipMood -# Use clipMood to confirm your template choice +# 1. Transcribe a song +recoup content transcribe-audio --artist --json +# Check output: songUrl, fullLyrics, segments # 2. Generate image -recoup content image --artist --template --json +recoup content generate-image --artist --template --json # Check output: imageUrl — does it match the aesthetic? # 3. (Optional) Upscale image recoup content upscale --url --type image --json # 4. Generate video -recoup content video --image --json +recoup content generate-video --image --json # Check output: videoUrl -# 5. Generate on-screen text -recoup content text --artist --song "" --length short --json +# 5. Generate on-screen caption +recoup content generate-caption --artist --song "" --length short --json # Synchronous — returns { content } immediately # 6. Render final video @@ -106,7 +105,7 @@ recoup content create --artist --template artist-caption-bedroom --json ```bash # Audio first, then video with --lipsync -recoup content video --image --lipsync \ +recoup content generate-video --image --lipsync \ --song-url --start --duration 15 # When rendering, pass --has-audio so ffmpeg doesn't double the audio @@ -118,9 +117,9 @@ recoup content render --video --audio \ The point of primitives is that you can redo any step without rerunning everything: -- Bad image? Run `image` again (different reference each time) -- Wrong song moment? Run `audio` again -- Text too long? Run `text` with `--length short` +- Bad image? Run `generate-image` again (different reference each time) +- Wrong song moment? Run `transcribe-audio` again +- Text too long? Run `generate-caption` with `--length short` - Low quality? Run `upscale` on the image or video - Everything good but text is wrong? Just rerun `render` with new text From bd9e5080998d68ef8590477b03808c762c2dcff9 Mon Sep 17 00:00:00 2001 From: Sidney Swift <158200036+sidneyswift@users.noreply.github.com> Date: Thu, 2 Apr 2026 05:21:30 -0400 Subject: [PATCH 04/13] refactor: rewrite content-creation skill for generic primitives - Removed music-specific language and param names - Added model selection docs and edit operations reference - Updated all workflow examples with new CLI flags - Added analyze-video to primitives table Made-with: Cursor --- content-creation/SKILL.md | 145 +++++++++++++++++++------------------- 1 file changed, 74 insertions(+), 71 deletions(-) diff --git a/content-creation/SKILL.md b/content-creation/SKILL.md index 98db460..7b02cc0 100644 --- a/content-creation/SKILL.md +++ b/content-creation/SKILL.md @@ -1,15 +1,15 @@ --- name: content-creation -description: Create social videos, TikToks, Reels, and visual content for music artists using AI-generated images, video, on-screen text, and song audio. Use this skill whenever the user asks to create content, make a video, generate an image, produce a TikTok or Reel, create a promotional clip, add on-screen text or captions to a video, do a face swap, make visual content for an artist, post to social media, or create short-form video from a song. Also use when the user wants to iterate on content quality — regenerate images, try different songs, adjust text, or upscale for higher quality. +description: Create social videos, TikToks, Reels, and visual content using AI-generated images, video, captions, and audio. Use this skill whenever the user asks to create content, make a video, generate an image, produce a TikTok or Reel, create a promotional clip, add captions to a video, make visual content, or create short-form video. Also use when the user wants to iterate on content quality — regenerate images, try different audio, adjust text, or upscale for higher quality. --- # Content Creation -Create social-ready short-form videos for music artists by composing independent primitives. Each primitive does one thing. You orchestrate them based on the artist's identity and the song's energy. +Create social-ready short-form videos by composing independent primitives. Each primitive does one thing. You orchestrate them. -## How It Works +## Primitives -Six primitives, each callable independently: +Seven primitives, each callable independently: | Primitive | Command | Async? | |-----------|---------|--------| @@ -17,82 +17,50 @@ Six primitives, each callable independently: | Generate image | `recoup content generate-image` | Yes (10-30s) | | Generate video | `recoup content generate-video` | Yes (60-180s) | | Generate caption | `recoup content generate-caption` | No (2-5s) | -| Render final | `recoup content render` | Yes (10-30s) | +| Edit media | `recoup content edit` | Yes (10-30s) | | Upscale | `recoup content upscale` | Yes (30-60s) | +| Analyze video | `recoup content analyze-video` | No (10-30s) | Async primitives return a `runId`. Poll with `recoup tasks status --run --json` until `status` is `COMPLETED`, then read `output`. -There is also `recoup content create` which runs all steps in one shot — use it when the user just wants a video without creative control. - -## Creative Decision-Making - -The skill's value is not in running commands — it is in making creative decisions at each step. Read the artist's `context/artist.md` before starting. It contains their personality, aesthetic direction, mood, colors, and sacred rules. - -### Choosing a template - -Templates control the visual scene (lighting, camera, environment). Match the template to the artist's aesthetic and the song's mood: - -| Template | Scene | When to use | -|----------|-------|-------------| -| `artist-caption-bedroom` | Moody purple bedroom, deadpan selfie | Introspective songs, vulnerable lyrics, lo-fi vibe | -| `artist-caption-outside` | Night street, phone on ground | Urban feel, cinematic energy, confident tracks | -| `artist-caption-stage` | Small venue, fan cam angle | Performance energy, hype songs, live feel | -| `album-record-store` | Vinyl on display in NYC record store | Album/single promotion, release day content | - -If `artist.md` says "dark, moody, introspective" — use bedroom. If it says "cinematic, urban" — use outside. When in doubt, bedroom is the safest default for most emerging artists. - -### Choosing a song clip - -Start with audio before image because the song's mood should influence your template choice. If the artist has multiple songs, pick one that matches the content goal: -- Promoting a new release? Use `--song ` to target it -- General content? Let the pipeline pick randomly -- Have a specific audio file? Pass the URL: `--song https://...` - -After transcription completes, check the lyrics and mood in the output. If the mood doesn't match the template you planned, switch templates before generating the image. - -### Evaluating intermediate results - -After each step, assess quality before moving on: - -- **Image**: Does it match the template's aesthetic? Is the face recognizable? If not, run `recoup content generate-image` again — it picks a different reference composition each time. -- **Video**: Is the motion natural? Lipsync clips should show mouth movement matching the lyrics. If the video is too static or glitchy, regenerate. -- **Text**: Does the text connect to the song's lyrics/theme? Is the length right for the platform? Try a different `--length` if it feels off. -- **Upscale**: Only upscale if you need higher quality (adds 30-60s per step). Skip for quick drafts. - -### Handling edge cases - -- **Instrumental songs (no lyrics)**: Audio selection still works — Whisper returns empty lyrics. Text generation will have less to work with. Consider writing text manually via `--text "your text here"` on the render command. -- **Artist has no face-guide.png**: Image generation will fail. Ask the user to provide a face image URL with `--face-guide `. -- **Song URL instead of repo slug**: Pass URLs directly in `--song`. The pipeline downloads, transcribes, and analyzes them the same way. +There is also `recoup content create` which runs the full pipeline in one shot — use it when the user just wants a video without creative control. ## Workflow ### Step-by-step (creative control) ```bash -# 1. Transcribe a song -recoup content transcribe-audio --artist --json -# Check output: songUrl, fullLyrics, segments +# 1. Transcribe audio +recoup content transcribe-audio --url --json +# Returns: audioUrl, fullLyrics, segments (timestamped words) # 2. Generate image -recoup content generate-image --artist --template --json -# Check output: imageUrl — does it match the aesthetic? +recoup content generate-image --prompt "" \ + --reference-image --json +# Returns: imageUrl # 3. (Optional) Upscale image recoup content upscale --url --type image --json # 4. Generate video recoup content generate-video --image --json -# Check output: videoUrl +# Returns: videoUrl -# 5. Generate on-screen caption -recoup content generate-caption --artist --song "" --length short --json +# 5. Generate caption +recoup content generate-caption --topic "" --length short --json # Synchronous — returns { content } immediately -# 6. Render final video -recoup content render --video --audio \ - --start --duration 15 --text "" --json -# Check output: final videoUrl +# 6. Edit final video (template mode) +recoup content edit --video --audio \ + --template artist-caption-bedroom \ + --overlay-text "" --json + +# 6. Edit final video (manual mode) +recoup content edit --video \ + --trim-start 30 --trim-duration 15 \ + --crop-aspect 9:16 \ + --overlay-text "" \ + --mux-audio --json ``` ### Quick (no creative decisions) @@ -101,26 +69,61 @@ recoup content render --video --audio \ recoup content create --artist --template artist-caption-bedroom --json ``` -### Lipsync (mouth moves to lyrics) +### Lipsync (mouth moves to audio) ```bash -# Audio first, then video with --lipsync +# Generate video with audio-driven animation recoup content generate-video --image --lipsync \ - --song-url --start --duration 15 + --audio --json -# When rendering, pass --has-audio so ffmpeg doesn't double the audio -recoup content render --video --audio \ - --start --duration 15 --text "" --has-audio +# Edit — no need to mux audio separately (already baked in) +recoup content edit --video \ + --crop-aspect 9:16 \ + --overlay-text "" --json ``` +## Model Selection + +Each generative primitive accepts an optional `--model` flag to override the default: + +| Primitive | Default Model | Flag | +|-----------|---------------|------| +| generate-image | `fal-ai/nano-banana-pro/edit` | `--model ` | +| generate-video | `fal-ai/veo3.1/fast/image-to-video` | `--model ` | +| generate-video (lipsync) | `fal-ai/ltx-2-19b/audio-to-video` | `--model ` | +| transcribe-audio | `fal-ai/whisper` | `--model ` | + +## Edit Operations + +The `edit` command accepts either a template (deterministic config) or manual flags: + +| Operation | Flags | What it does | +|-----------|-------|-------------| +| Trim | `--trim-start --trim-duration ` | Cut a time window | +| Crop | `--crop-aspect ` | Crop to aspect ratio (e.g. 9:16) | +| Overlay text | `--overlay-text --text-color --text-position ` | Add on-screen text | +| Mux audio | `--mux-audio ` | Add audio track to video | +| Template | `--template ` | Use template's preset operations | + +All operations run in a single processing pass. + +## Evaluating Results + +After each step, assess quality before moving on: + +- **Image**: Does it match the desired aesthetic? If not, run `generate-image` again with a different prompt or reference image. +- **Video**: Is the motion natural? Lipsync should show movement matching the audio. If glitchy, regenerate. +- **Caption**: Does the text connect to the topic? Try a different `--length` if it feels off. +- **Upscale**: Only upscale if you need higher quality (adds 30-60s). Skip for quick drafts. + ## Iteration -The point of primitives is that you can redo any step without rerunning everything: +Redo any step without rerunning everything: -- Bad image? Run `generate-image` again (different reference each time) -- Wrong song moment? Run `transcribe-audio` again -- Text too long? Run `generate-caption` with `--length short` +- Bad image? Run `generate-image` again with a different prompt +- Wrong audio moment? Transcribe a different file or URL +- Caption too long? Run `generate-caption` with `--length short` - Low quality? Run `upscale` on the image or video -- Everything good but text is wrong? Just rerun `render` with new text +- Everything good but caption is wrong? Just rerun `edit` with new `--overlay-text` -Each retry costs only that step's time, not the full 5-10 minute pipeline. +Each retry costs only that step's time, not the full pipeline. From 152d286c00b2844317ce5a9aac004a5e47135481 Mon Sep 17 00:00:00 2001 From: Sidney Swift <158200036+sidneyswift@users.noreply.github.com> Date: Thu, 2 Apr 2026 05:23:36 -0400 Subject: [PATCH 05/13] feat: add analyze-video feedback loop to content creation skill Teaches the agent to watch its own generated videos using analyze-video after every generation step, evaluate quality, and iterate based on the feedback. Made-with: Cursor --- content-creation/SKILL.md | 31 +++++++++++++++++++++++++------ 1 file changed, 25 insertions(+), 6 deletions(-) diff --git a/content-creation/SKILL.md b/content-creation/SKILL.md index 7b02cc0..77a5429 100644 --- a/content-creation/SKILL.md +++ b/content-creation/SKILL.md @@ -107,14 +107,32 @@ The `edit` command accepts either a template (deterministic config) or manual fl All operations run in a single processing pass. -## Evaluating Results +## Watching Your Work -After each step, assess quality before moving on: +After generating a video, use `analyze-video` to watch it before moving on. This is how you evaluate your own creative output and decide what to improve. -- **Image**: Does it match the desired aesthetic? If not, run `generate-image` again with a different prompt or reference image. -- **Video**: Is the motion natural? Lipsync should show movement matching the audio. If glitchy, regenerate. -- **Caption**: Does the text connect to the topic? Try a different `--length` if it feels off. -- **Upscale**: Only upscale if you need higher quality (adds 30-60s). Skip for quick drafts. +```bash +# After generate-video returns a videoUrl: +recoup content analyze-video --url \ + --prompt "Evaluate this video for social media. Is the motion natural? Is the subject recognizable? Does it feel polished or glitchy? Rate 1-10 and explain what could improve." \ + --json +``` + +Read the analysis. If it flags problems, fix them: +- **Glitchy motion or artifacts** → regenerate video with a different `--model` or `--motion` prompt +- **Subject not recognizable** → regenerate image with a better `--reference-image` or more specific `--prompt` +- **Too static / boring** → try a more dynamic `--motion` prompt or switch to lipsync mode +- **Good quality but wrong mood** → the video is fine, adjust the caption or audio choice instead + +Do this after every video generation, not just when something looks wrong. The analysis catches things you might miss and builds a feedback loop that improves each iteration. + +For final edited videos, analyze again after the edit pass: + +```bash +recoup content analyze-video --url \ + --prompt "This is the final social video with caption and audio. Is the text readable? Does the crop look right? Is the audio synced? Any issues that would hurt engagement?" \ + --json +``` ## Iteration @@ -124,6 +142,7 @@ Redo any step without rerunning everything: - Wrong audio moment? Transcribe a different file or URL - Caption too long? Run `generate-caption` with `--length short` - Low quality? Run `upscale` on the image or video +- Video glitchy? Check `analyze-video` feedback, then regenerate with adjusted params - Everything good but caption is wrong? Just rerun `edit` with new `--overlay-text` Each retry costs only that step's time, not the full pipeline. From e603003f19a8fcf7ffaf32e6df26a5eb717b8cfe Mon Sep 17 00:00:00 2001 From: Sidney Swift <158200036+sidneyswift@users.noreply.github.com> Date: Thu, 2 Apr 2026 05:24:56 -0400 Subject: [PATCH 06/13] feat: add taste and QA evaluation to analyze-video feedback loop Two distinct evaluation passes: QA (artifacts, glitches, motion) after generation, and taste (hook, aesthetic, platform readiness) after the final edit. Scoring rubric guides iteration decisions. Made-with: Cursor --- content-creation/SKILL.md | 36 +++++++++++++++++++++++++----------- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/content-creation/SKILL.md b/content-creation/SKILL.md index 77a5429..ed8f6c0 100644 --- a/content-creation/SKILL.md +++ b/content-creation/SKILL.md @@ -109,31 +109,45 @@ All operations run in a single processing pass. ## Watching Your Work -After generating a video, use `analyze-video` to watch it before moving on. This is how you evaluate your own creative output and decide what to improve. +After generating a video, use `analyze-video` to watch it before moving on. This is your eyes. You can't see pixels — but this primitive can. Use it to evaluate three things: **quality**, **taste**, and **readiness**. + +### After generating a video (QA check) ```bash -# After generate-video returns a videoUrl: recoup content analyze-video --url \ - --prompt "Evaluate this video for social media. Is the motion natural? Is the subject recognizable? Does it feel polished or glitchy? Rate 1-10 and explain what could improve." \ + --prompt "QA this video. Check for: 1) Visual artifacts, glitches, or distortion. 2) Whether the subject is recognizable and consistent. 3) Whether motion looks natural or robotic. 4) Any frames that look broken or repeated. Rate quality 1-10. List specific issues." \ --json ``` -Read the analysis. If it flags problems, fix them: -- **Glitchy motion or artifacts** → regenerate video with a different `--model` or `--motion` prompt +Fix what it finds: +- **Artifacts or glitches** → regenerate with a different `--model` or `--motion` prompt - **Subject not recognizable** → regenerate image with a better `--reference-image` or more specific `--prompt` -- **Too static / boring** → try a more dynamic `--motion` prompt or switch to lipsync mode -- **Good quality but wrong mood** → the video is fine, adjust the caption or audio choice instead - -Do this after every video generation, not just when something looks wrong. The analysis catches things you might miss and builds a feedback loop that improves each iteration. +- **Robotic motion** → try a more natural `--motion` prompt or switch to lipsync -For final edited videos, analyze again after the edit pass: +### After editing the final video (taste + platform readiness) ```bash recoup content analyze-video --url \ - --prompt "This is the final social video with caption and audio. Is the text readable? Does the crop look right? Is the audio synced? Any issues that would hurt engagement?" \ + --prompt "Evaluate this as a social media video for TikTok/Reels. Score each 1-10: +1) HOOK: Would someone stop scrolling in the first 2 seconds? +2) VISUAL TASTE: Does it feel intentional and aesthetic, or generic and AI-generated? +3) TEXT: Is the caption readable, well-positioned, and not blocking the subject? +4) AUDIO SYNC: Does the audio match the visual energy and pacing? +5) CROP: Is the framing good for vertical (9:16)? +6) OVERALL: Would you post this? What one change would make it better?" \ --json ``` +This is the creative gut-check. A 6/10 on hook means the opening is boring — try a more dramatic image or motion prompt. A 4/10 on taste means it looks like AI slop — use a different model or reference image. A low text score means reposition or shorten the caption. + +### When to analyze + +- **Always** after `generate-video` — catch quality issues before wasting time on editing +- **Always** after the final `edit` — catch taste and platform issues before delivering +- **Optionally** after `generate-image` if you want to evaluate the still before animating it (use a prompt like "Is this image aesthetic and recognizable? Would it work as a social media thumbnail?") + +The goal is not perfection — it's iteration. Generate, watch, fix, watch again. Two rounds usually gets you from mediocre to good. + ## Iteration Redo any step without rerunning everything: From 0b56977a9cdf3009d122f0da145fe8eccd8a914b Mon Sep 17 00:00:00 2001 From: Sidney Swift <158200036+sidneyswift@users.noreply.github.com> Date: Thu, 2 Apr 2026 05:26:12 -0400 Subject: [PATCH 07/13] fix: remove hardcoded prompts from analyze feedback loop Gives guidance on what to look for (QA, taste, platform readiness) and how to act on feedback, but lets the agent write its own prompts. Made-with: Cursor --- content-creation/SKILL.md | 58 +++++++++++++++++++-------------------- 1 file changed, 29 insertions(+), 29 deletions(-) diff --git a/content-creation/SKILL.md b/content-creation/SKILL.md index ed8f6c0..61ee38b 100644 --- a/content-creation/SKILL.md +++ b/content-creation/SKILL.md @@ -109,44 +109,44 @@ All operations run in a single processing pass. ## Watching Your Work -After generating a video, use `analyze-video` to watch it before moving on. This is your eyes. You can't see pixels — but this primitive can. Use it to evaluate three things: **quality**, **taste**, and **readiness**. +You can't see pixels — but `analyze-video` can. Use it freely throughout the workflow to evaluate your output. Write your own prompts based on what you need to know. -### After generating a video (QA check) +### What to look for -```bash -recoup content analyze-video --url \ - --prompt "QA this video. Check for: 1) Visual artifacts, glitches, or distortion. 2) Whether the subject is recognizable and consistent. 3) Whether motion looks natural or robotic. 4) Any frames that look broken or repeated. Rate quality 1-10. List specific issues." \ - --json -``` +**QA** — technical quality problems: +- Visual artifacts, glitches, distortion +- Subject consistency and recognizability +- Motion naturalness vs robotic movement +- Broken or repeated frames -Fix what it finds: -- **Artifacts or glitches** → regenerate with a different `--model` or `--motion` prompt -- **Subject not recognizable** → regenerate image with a better `--reference-image` or more specific `--prompt` -- **Robotic motion** → try a more natural `--motion` prompt or switch to lipsync +**Taste** — creative quality: +- Does the opening hook attention in the first 2 seconds? +- Does it feel intentional and aesthetic, or generic? +- Does the visual energy match the audio energy? +- Would a real person post this? -### After editing the final video (taste + platform readiness) +**Platform readiness** — practical details: +- Text readability and positioning (not blocking the subject, not cut off by UI) +- Vertical crop framing +- Audio-visual sync and pacing -```bash -recoup content analyze-video --url \ - --prompt "Evaluate this as a social media video for TikTok/Reels. Score each 1-10: -1) HOOK: Would someone stop scrolling in the first 2 seconds? -2) VISUAL TASTE: Does it feel intentional and aesthetic, or generic and AI-generated? -3) TEXT: Is the caption readable, well-positioned, and not blocking the subject? -4) AUDIO SYNC: Does the audio match the visual energy and pacing? -5) CROP: Is the framing good for vertical (9:16)? -6) OVERALL: Would you post this? What one change would make it better?" \ - --json -``` +### When to analyze -This is the creative gut-check. A 6/10 on hook means the opening is boring — try a more dramatic image or motion prompt. A 4/10 on taste means it looks like AI slop — use a different model or reference image. A low text score means reposition or shorten the caption. +- **After `generate-video`** — catch quality issues before spending time editing +- **After `edit`** — evaluate the final product before delivering +- **After `generate-image`** — optionally check the still before animating it +- **Anytime** — use it to compare two versions, check if an upscale improved things, or validate a creative direction -### When to analyze +### Acting on feedback -- **Always** after `generate-video` — catch quality issues before wasting time on editing -- **Always** after the final `edit` — catch taste and platform issues before delivering -- **Optionally** after `generate-image` if you want to evaluate the still before animating it (use a prompt like "Is this image aesthetic and recognizable? Would it work as a social media thumbnail?") +- Artifacts or glitches → regenerate with a different `--model` or `--motion` prompt +- Subject not recognizable → regenerate image with a better `--reference-image` or prompt +- Robotic motion → try a more natural motion prompt or switch to lipsync +- Boring opening → more dramatic image or motion +- Looks like AI slop → different model, better reference image, or more specific prompt +- Bad text placement → adjust position or shorten caption in `edit` -The goal is not perfection — it's iteration. Generate, watch, fix, watch again. Two rounds usually gets you from mediocre to good. +Generate, watch, fix, watch again. Two rounds usually gets you from mediocre to good. ## Iteration From cc223f96f26bd957144347cddec05f70ad9faf80 Mon Sep 17 00:00:00 2001 From: Sidney Swift <158200036+sidneyswift@users.noreply.github.com> Date: Thu, 2 Apr 2026 05:28:38 -0400 Subject: [PATCH 08/13] feat: expand analyze usage for multi-clip editing and iterative workflows Analyze at every creative checkpoint: after each clip, each edit pass, multi-clip assembly, audio sync, text overlay, and version comparison. Fix issues early before building on top of them. Made-with: Cursor --- content-creation/SKILL.md | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/content-creation/SKILL.md b/content-creation/SKILL.md index 61ee38b..a8808ee 100644 --- a/content-creation/SKILL.md +++ b/content-creation/SKILL.md @@ -132,10 +132,17 @@ You can't see pixels — but `analyze-video` can. Use it freely throughout the w ### When to analyze -- **After `generate-video`** — catch quality issues before spending time editing -- **After `edit`** — evaluate the final product before delivering -- **After `generate-image`** — optionally check the still before animating it -- **Anytime** — use it to compare two versions, check if an upscale improved things, or validate a creative direction +Use it at every creative checkpoint, not just once at the end: + +- **After generating a clip** — catch quality issues before spending time editing +- **After each edit pass** — did the trim land on the right beat? Does the cut feel natural? +- **After combining multiple clips** — do the cuts flow? Does the pacing hold attention? +- **After adding audio** — is the lipsync convincing? Does the music energy match the visuals? +- **After adding text** — is it readable, well-timed, not blocking anything important? +- **When comparing versions** — which of two outputs is better and why? +- **When building a storyline** — does the sequence of shots tell a coherent story? + +For longer edits with multiple cuts, analyze after each major assembly — not just the final export. If cut 3 of 5 looks wrong, fix it before adding cuts 4 and 5. ### Acting on feedback From 54f9f438344006c8a259eb3d1c06667419cbfce5 Mon Sep 17 00:00:00 2001 From: Sidney Swift <158200036+sidneyswift@users.noreply.github.com> Date: Thu, 2 Apr 2026 06:04:57 -0400 Subject: [PATCH 09/13] fix: replace angle-bracket placeholders with curly braces Address CodeRabbit review: skills repo guidelines prohibit XML-style angle brackets in SKILL.md files. All placeholders now use {curly} brace syntax. Made-with: Cursor --- content-creation/SKILL.md | 64 +++++++++++++++++++-------------------- 1 file changed, 32 insertions(+), 32 deletions(-) diff --git a/content-creation/SKILL.md b/content-creation/SKILL.md index a8808ee..0a89425 100644 --- a/content-creation/SKILL.md +++ b/content-creation/SKILL.md @@ -21,7 +21,7 @@ Seven primitives, each callable independently: | Upscale | `recoup content upscale` | Yes (30-60s) | | Analyze video | `recoup content analyze-video` | No (10-30s) | -Async primitives return a `runId`. Poll with `recoup tasks status --run --json` until `status` is `COMPLETED`, then read `output`. +Async primitives return a `runId`. Poll with `recoup tasks status --run {runId} --json` until `status` is `COMPLETED`, then read `output`. There is also `recoup content create` which runs the full pipeline in one shot — use it when the user just wants a video without creative control. @@ -31,55 +31,55 @@ There is also `recoup content create` which runs the full pipeline in one shot ```bash # 1. Transcribe audio -recoup content transcribe-audio --url --json +recoup content transcribe-audio --url {audioUrl} --json # Returns: audioUrl, fullLyrics, segments (timestamped words) # 2. Generate image -recoup content generate-image --prompt "" \ - --reference-image --json +recoup content generate-image --prompt "{scene description}" \ + --reference-image {referenceImageUrl} --json # Returns: imageUrl # 3. (Optional) Upscale image -recoup content upscale --url --type image --json +recoup content upscale --url {imageUrl} --type image --json # 4. Generate video -recoup content generate-video --image --json +recoup content generate-video --image {imageUrl} --json # Returns: videoUrl # 5. Generate caption -recoup content generate-caption --topic "" --length short --json +recoup content generate-caption --topic "{topic}" --length short --json # Synchronous — returns { content } immediately # 6. Edit final video (template mode) -recoup content edit --video --audio \ +recoup content edit --video {videoUrl} --audio {audioUrl} \ --template artist-caption-bedroom \ - --overlay-text "" --json + --overlay-text "{caption}" --json # 6. Edit final video (manual mode) -recoup content edit --video \ +recoup content edit --video {videoUrl} \ --trim-start 30 --trim-duration 15 \ --crop-aspect 9:16 \ - --overlay-text "" \ - --mux-audio --json + --overlay-text "{caption}" \ + --mux-audio {audioUrl} --json ``` ### Quick (no creative decisions) ```bash -recoup content create --artist --template artist-caption-bedroom --json +recoup content create --artist {artistId} --template artist-caption-bedroom --json ``` ### Lipsync (mouth moves to audio) ```bash # Generate video with audio-driven animation -recoup content generate-video --image --lipsync \ - --audio --json +recoup content generate-video --image {imageUrl} --lipsync \ + --audio {audioUrl} --json # Edit — no need to mux audio separately (already baked in) -recoup content edit --video \ +recoup content edit --video {videoUrl} \ --crop-aspect 9:16 \ - --overlay-text "" --json + --overlay-text "{caption}" --json ``` ## Model Selection @@ -88,10 +88,10 @@ Each generative primitive accepts an optional `--model` flag to override the def | Primitive | Default Model | Flag | |-----------|---------------|------| -| generate-image | `fal-ai/nano-banana-pro/edit` | `--model ` | -| generate-video | `fal-ai/veo3.1/fast/image-to-video` | `--model ` | -| generate-video (lipsync) | `fal-ai/ltx-2-19b/audio-to-video` | `--model ` | -| transcribe-audio | `fal-ai/whisper` | `--model ` | +| generate-image | `fal-ai/nano-banana-pro/edit` | `--model {modelId}` | +| generate-video | `fal-ai/veo3.1/fast/image-to-video` | `--model {modelId}` | +| generate-video (lipsync) | `fal-ai/ltx-2-19b/audio-to-video` | `--model {modelId}` | +| transcribe-audio | `fal-ai/whisper` | `--model {modelId}` | ## Edit Operations @@ -99,11 +99,11 @@ The `edit` command accepts either a template (deterministic config) or manual fl | Operation | Flags | What it does | |-----------|-------|-------------| -| Trim | `--trim-start --trim-duration ` | Cut a time window | -| Crop | `--crop-aspect ` | Crop to aspect ratio (e.g. 9:16) | -| Overlay text | `--overlay-text --text-color --text-position ` | Add on-screen text | -| Mux audio | `--mux-audio ` | Add audio track to video | -| Template | `--template ` | Use template's preset operations | +| Trim | `--trim-start {s} --trim-duration {s}` | Cut a time window | +| Crop | `--crop-aspect {ratio}` | Crop to aspect ratio (e.g. 9:16) | +| Overlay text | `--overlay-text {text} --text-color {color} --text-position {pos}` | Add on-screen text | +| Mux audio | `--mux-audio {url}` | Add audio track to video | +| Template | `--template {name}` | Use template's preset operations | All operations run in a single processing pass. @@ -146,12 +146,12 @@ For longer edits with multiple cuts, analyze after each major assembly — not j ### Acting on feedback -- Artifacts or glitches → regenerate with a different `--model` or `--motion` prompt -- Subject not recognizable → regenerate image with a better `--reference-image` or prompt -- Robotic motion → try a more natural motion prompt or switch to lipsync -- Boring opening → more dramatic image or motion -- Looks like AI slop → different model, better reference image, or more specific prompt -- Bad text placement → adjust position or shorten caption in `edit` +- Artifacts or glitches — regenerate with a different `--model` or `--motion` prompt +- Subject not recognizable — regenerate image with a better `--reference-image` or prompt +- Robotic motion — try a more natural motion prompt or switch to lipsync +- Boring opening — more dramatic image or motion +- Looks like AI slop — different model, better reference image, or more specific prompt +- Bad text placement — adjust position or shorten caption in `edit` Generate, watch, fix, watch again. Two rounds usually gets you from mediocre to good. From 6450fa0561bc22949aae459db23b3863e251ed44 Mon Sep 17 00:00:00 2001 From: Sidney Swift <158200036+sidneyswift@users.noreply.github.com> Date: Thu, 2 Apr 2026 11:17:33 -0400 Subject: [PATCH 10/13] fix: update skill to show generate-video accepts prompt without image MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Image is optional — video can be generated from just a prompt. Made-with: Cursor --- content-creation/SKILL.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/content-creation/SKILL.md b/content-creation/SKILL.md index 0a89425..77eb091 100644 --- a/content-creation/SKILL.md +++ b/content-creation/SKILL.md @@ -42,8 +42,10 @@ recoup content generate-image --prompt "{scene description}" \ # 3. (Optional) Upscale image recoup content upscale --url {imageUrl} --type image --json -# 4. Generate video +# 4. Generate video (from image, or prompt-only, or both) recoup content generate-video --image {imageUrl} --json +# Or without an image: +recoup content generate-video --prompt "{video description}" --json # Returns: videoUrl # 5. Generate caption From 8ce7004ef689475c0edafb471ef55e2264786d8b Mon Sep 17 00:00:00 2001 From: Sidney Swift <158200036+sidneyswift@users.noreply.github.com> Date: Thu, 2 Apr 2026 14:07:51 -0400 Subject: [PATCH 11/13] feat: update skill with 6 video modes, model table, and workflow examples - New Video Generation Modes section with mode table and when-to-use guide - Updated model selection table with all auto-selected models - Added extend and first-last workflow examples - Updated lipsync to use --mode lipsync instead of --lipsync flag - Added extend and first-last to iteration and acting-on-feedback sections Made-with: Cursor --- content-creation/SKILL.md | 89 ++++++++++++++++++++++++++++----------- 1 file changed, 64 insertions(+), 25 deletions(-) diff --git a/content-creation/SKILL.md b/content-creation/SKILL.md index 77eb091..70aae5c 100644 --- a/content-creation/SKILL.md +++ b/content-creation/SKILL.md @@ -17,7 +17,7 @@ Seven primitives, each callable independently: | Generate image | `recoup content generate-image` | Yes (10-30s) | | Generate video | `recoup content generate-video` | Yes (60-180s) | | Generate caption | `recoup content generate-caption` | No (2-5s) | -| Edit media | `recoup content edit` | Yes (10-30s) | +| Edit | `recoup content edit` | Yes (10-30s) | | Upscale | `recoup content upscale` | Yes (30-60s) | | Analyze video | `recoup content analyze-video` | No (10-30s) | @@ -25,6 +25,30 @@ Async primitives return a `runId`. Poll with `recoup tasks status --run {runId} There is also `recoup content create` which runs the full pipeline in one shot — use it when the user just wants a video without creative control. +## Video Generation Modes + +`generate-video` supports 6 modes. Set `--mode` to be explicit, or omit it and the mode is inferred from the inputs you provide. + +| Mode | What it does | Required inputs | +|------|-------------|-----------------| +| `prompt` | Create a video from a text description | `--prompt` | +| `animate` | Animate a still image | `--image`, `--prompt` | +| `reference` | Use an image as a style/subject reference (not the first frame) | `--image`, `--prompt` | +| `extend` | Continue an existing video (max 8s input) | `--video`, `--prompt` | +| `first-last` | Generate a video that transitions between two images | `--image`, `--end-image`, `--prompt` | +| `lipsync` | Sync face movement to an audio clip | `--image`, `--audio` | + +All modes support `--aspect-ratio` (auto, 16:9, 9:16), `--duration` (4s-8s), and `--resolution` (720p, 1080p, 4k). + +### When to use which mode + +- "I need a cinematic establishing shot" → `prompt` +- "I have a photo of the artist, make it come alive" → `animate` +- "I have a photo of the artist, put them in a new scene" → `reference` +- "This clip is great but too short" → `extend` +- "I want a smooth transition from the studio to the stage" → `first-last` +- "I need the face to sing along to this audio" → `lipsync` + ## Workflow ### Step-by-step (creative control) @@ -37,32 +61,26 @@ recoup content transcribe-audio --url {audioUrl} --json # 2. Generate image recoup content generate-image --prompt "{scene description}" \ --reference-image {referenceImageUrl} --json -# Returns: imageUrl +# Returns: imageUrl, images (array if num_images > 1) # 3. (Optional) Upscale image recoup content upscale --url {imageUrl} --type image --json -# 4. Generate video (from image, or prompt-only, or both) -recoup content generate-video --image {imageUrl} --json -# Or without an image: -recoup content generate-video --prompt "{video description}" --json -# Returns: videoUrl +# 4. Generate video +recoup content generate-video --mode animate --image {imageUrl} \ + --prompt "{how to animate}" --json +# Or from scratch: +recoup content generate-video --mode prompt --prompt "{scene}" --json +# Returns: videoUrl, mode # 5. Generate caption recoup content generate-caption --topic "{topic}" --length short --json # Synchronous — returns { content } immediately -# 6. Edit final video (template mode) +# 6. Edit final video recoup content edit --video {videoUrl} --audio {audioUrl} \ --template artist-caption-bedroom \ --overlay-text "{caption}" --json - -# 6. Edit final video (manual mode) -recoup content edit --video {videoUrl} \ - --trim-start 30 --trim-duration 15 \ - --crop-aspect 9:16 \ - --overlay-text "{caption}" \ - --mux-audio {audioUrl} --json ``` ### Quick (no creative decisions) @@ -71,27 +89,46 @@ recoup content edit --video {videoUrl} \ recoup content create --artist {artistId} --template artist-caption-bedroom --json ``` -### Lipsync (mouth moves to audio) +### Lipsync ```bash -# Generate video with audio-driven animation -recoup content generate-video --image {imageUrl} --lipsync \ - --audio {audioUrl} --json +recoup content generate-video --mode lipsync \ + --image {faceUrl} --audio {audioUrl} --json -# Edit — no need to mux audio separately (already baked in) +# Edit — no need to mux audio (already baked into the video) recoup content edit --video {videoUrl} \ --crop-aspect 9:16 \ --overlay-text "{caption}" --json ``` +### Extend a clip + +```bash +recoup content generate-video --mode extend \ + --video {shortClipUrl} --prompt "continue the scene naturally" --json +``` + +### Transition between two shots + +```bash +recoup content generate-video --mode first-last \ + --image {startFrameUrl} --end-image {endFrameUrl} \ + --prompt "smooth cinematic transition" --json +``` + ## Model Selection Each generative primitive accepts an optional `--model` flag to override the default: | Primitive | Default Model | Flag | |-----------|---------------|------| -| generate-image | `fal-ai/nano-banana-pro/edit` | `--model {modelId}` | -| generate-video | `fal-ai/veo3.1/fast/image-to-video` | `--model {modelId}` | +| generate-image (prompt only) | `fal-ai/nano-banana-2` | `--model {modelId}` | +| generate-image (with reference) | `fal-ai/nano-banana-2/edit` | `--model {modelId}` | +| generate-video (prompt) | `fal-ai/veo3.1/text-to-video` | `--model {modelId}` | +| generate-video (animate) | `fal-ai/veo3.1/image-to-video` | `--model {modelId}` | +| generate-video (reference) | `fal-ai/veo3.1/reference-to-video` | `--model {modelId}` | +| generate-video (extend) | `fal-ai/veo3.1/extend-video` | `--model {modelId}` | +| generate-video (first-last) | `fal-ai/veo3.1/first-last-frame-to-video` | `--model {modelId}` | | generate-video (lipsync) | `fal-ai/ltx-2-19b/audio-to-video` | `--model {modelId}` | | transcribe-audio | `fal-ai/whisper` | `--model {modelId}` | @@ -148,11 +185,12 @@ For longer edits with multiple cuts, analyze after each major assembly — not j ### Acting on feedback -- Artifacts or glitches — regenerate with a different `--model` or `--motion` prompt +- Artifacts or glitches — regenerate with a different `--model` or prompt - Subject not recognizable — regenerate image with a better `--reference-image` or prompt -- Robotic motion — try a more natural motion prompt or switch to lipsync -- Boring opening — more dramatic image or motion +- Robotic motion — try a more natural prompt, switch mode, or use lipsync +- Boring opening — more dramatic image or use `first-last` for a striking transition - Looks like AI slop — different model, better reference image, or more specific prompt +- Clip too short — use `extend` mode to continue it - Bad text placement — adjust position or shorten caption in `edit` Generate, watch, fix, watch again. Two rounds usually gets you from mediocre to good. @@ -166,6 +204,7 @@ Redo any step without rerunning everything: - Caption too long? Run `generate-caption` with `--length short` - Low quality? Run `upscale` on the image or video - Video glitchy? Check `analyze-video` feedback, then regenerate with adjusted params +- Clip too short? Run `generate-video --mode extend` to continue it - Everything good but caption is wrong? Just rerun `edit` with new `--overlay-text` Each retry costs only that step's time, not the full pipeline. From 24f2bea5e16623762d435ff0306f3f1e1ebe4b6d Mon Sep 17 00:00:00 2001 From: Sidney Swift <158200036+sidneyswift@users.noreply.github.com> Date: Thu, 2 Apr 2026 18:44:32 -0400 Subject: [PATCH 12/13] refactor: update skill command names to match simplified endpoints All commands now use short nouns: image, video, caption, transcribe, analyze, edit, upscale. Made-with: Cursor --- content-creation/SKILL.md | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/content-creation/SKILL.md b/content-creation/SKILL.md index 70aae5c..11b43e0 100644 --- a/content-creation/SKILL.md +++ b/content-creation/SKILL.md @@ -13,13 +13,13 @@ Seven primitives, each callable independently: | Primitive | Command | Async? | |-----------|---------|--------| -| Transcribe audio | `recoup content transcribe-audio` | Yes (30-60s) | -| Generate image | `recoup content generate-image` | Yes (10-30s) | -| Generate video | `recoup content generate-video` | Yes (60-180s) | -| Generate caption | `recoup content generate-caption` | No (2-5s) | +| Transcribe audio | `recoup content transcribe` | Yes (30-60s) | +| Generate image | `recoup content image` | Yes (10-30s) | +| Generate video | `recoup content video` | Yes (60-180s) | +| Generate caption | `recoup content caption` | No (2-5s) | | Edit | `recoup content edit` | Yes (10-30s) | | Upscale | `recoup content upscale` | Yes (30-60s) | -| Analyze video | `recoup content analyze-video` | No (10-30s) | +| Analyze video | `recoup content analyze` | No (10-30s) | Async primitives return a `runId`. Poll with `recoup tasks status --run {runId} --json` until `status` is `COMPLETED`, then read `output`. @@ -55,11 +55,11 @@ All modes support `--aspect-ratio` (auto, 16:9, 9:16), `--duration` (4s-8s), and ```bash # 1. Transcribe audio -recoup content transcribe-audio --url {audioUrl} --json +recoup content transcribe --url {audioUrl} --json # Returns: audioUrl, fullLyrics, segments (timestamped words) # 2. Generate image -recoup content generate-image --prompt "{scene description}" \ +recoup content image --prompt "{scene description}" \ --reference-image {referenceImageUrl} --json # Returns: imageUrl, images (array if num_images > 1) @@ -67,14 +67,14 @@ recoup content generate-image --prompt "{scene description}" \ recoup content upscale --url {imageUrl} --type image --json # 4. Generate video -recoup content generate-video --mode animate --image {imageUrl} \ +recoup content video --mode animate --image {imageUrl} \ --prompt "{how to animate}" --json # Or from scratch: -recoup content generate-video --mode prompt --prompt "{scene}" --json +recoup content video --mode prompt --prompt "{scene}" --json # Returns: videoUrl, mode # 5. Generate caption -recoup content generate-caption --topic "{topic}" --length short --json +recoup content caption --topic "{topic}" --length short --json # Synchronous — returns { content } immediately # 6. Edit final video @@ -92,7 +92,7 @@ recoup content create --artist {artistId} --template artist-caption-bedroom --js ### Lipsync ```bash -recoup content generate-video --mode lipsync \ +recoup content video --mode lipsync \ --image {faceUrl} --audio {audioUrl} --json # Edit — no need to mux audio (already baked into the video) @@ -104,14 +104,14 @@ recoup content edit --video {videoUrl} \ ### Extend a clip ```bash -recoup content generate-video --mode extend \ +recoup content video --mode extend \ --video {shortClipUrl} --prompt "continue the scene naturally" --json ``` ### Transition between two shots ```bash -recoup content generate-video --mode first-last \ +recoup content video --mode first-last \ --image {startFrameUrl} --end-image {endFrameUrl} \ --prompt "smooth cinematic transition" --json ``` From b5b3b70427ce03c5f44ff11def9481500f5bfabb Mon Sep 17 00:00:00 2001 From: Sidney Swift <158200036+sidneyswift@users.noreply.github.com> Date: Fri, 3 Apr 2026 11:45:55 -0400 Subject: [PATCH 13/13] =?UTF-8?q?feat:=20content-creation=20skill=20V2=20?= =?UTF-8?q?=E2=80=94=20malleable-first,=20analyze,=20override=20priority?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Lead with malleable-first philosophy (templates optional) - Add analyze primitive to table and workflow - Rename workflows: "Without a template" / "With a template" - Update edit command to use --url (supports images too) - Add template override priority section - Update frontmatter description Made-with: Cursor --- content-creation/SKILL.md | 36 ++++++++++++++++++++++++++++++------ 1 file changed, 30 insertions(+), 6 deletions(-) diff --git a/content-creation/SKILL.md b/content-creation/SKILL.md index 11b43e0..cc9c779 100644 --- a/content-creation/SKILL.md +++ b/content-creation/SKILL.md @@ -1,6 +1,6 @@ --- name: content-creation -description: Create social videos, TikToks, Reels, and visual content using AI-generated images, video, captions, and audio. Use this skill whenever the user asks to create content, make a video, generate an image, produce a TikTok or Reel, create a promotional clip, add captions to a video, make visual content, or create short-form video. Also use when the user wants to iterate on content quality — regenerate images, try different audio, adjust text, or upscale for higher quality. +description: Create social videos, TikToks, Reels, and visual content using AI-generated images, video, captions, and audio. Use this skill whenever the user asks to create content, make a video, generate an image, produce a TikTok or Reel, create a promotional clip, add captions to a video, make visual content, or create short-form video. Also use when the user wants to iterate on content quality — regenerate images, try different audio, adjust text, or upscale for higher quality. Templates are optional — pass your own prompts for full creative control, or use templates as curated shortcuts. --- # Content Creation @@ -9,6 +9,8 @@ Create social-ready short-form videos by composing independent primitives. Each ## Primitives +Every primitive works without a template. Pass your own prompt, reference images, and style rules directly. Templates are optional shortcuts — curated creative recipes that pre-fill parameters for a specific look. + Seven primitives, each callable independently: | Primitive | Command | Async? | @@ -51,7 +53,7 @@ All modes support `--aspect-ratio` (auto, 16:9, 9:16), `--duration` (4s-8s), and ## Workflow -### Step-by-step (creative control) +### Without a template (full control) ```bash # 1. Transcribe audio @@ -78,12 +80,17 @@ recoup content caption --topic "{topic}" --length short --json # Synchronous — returns { content } immediately # 6. Edit final video -recoup content edit --video {videoUrl} --audio {audioUrl} \ +recoup content edit --url {videoUrl} --audio {audioUrl} \ --template artist-caption-bedroom \ --overlay-text "{caption}" --json + +# 7. (Optional) Analyze the result +recoup content analyze --video {videoUrl} --prompt "Rate this content 1-10 for social media engagement. Note any quality issues." --json ``` -### Quick (no creative decisions) +### With a template (shortcuts) + +Templates pre-fill prompts, reference images, and style rules. You can override any parameter. ```bash recoup content create --artist {artistId} --template artist-caption-bedroom --json @@ -96,7 +103,7 @@ recoup content video --mode lipsync \ --image {faceUrl} --audio {audioUrl} --json # Edit — no need to mux audio (already baked into the video) -recoup content edit --video {videoUrl} \ +recoup content edit --url {videoUrl} \ --crop-aspect 9:16 \ --overlay-text "{caption}" --json ``` @@ -134,7 +141,7 @@ Each generative primitive accepts an optional `--model` flag to override the def ## Edit Operations -The `edit` command accepts either a template (deterministic config) or manual flags: +The `edit` command accepts either a template (deterministic config) or manual flags. Point `--url` at image or video — both are supported. | Operation | Flags | What it does | |-----------|-------|-------------| @@ -146,6 +153,23 @@ The `edit` command accepts either a template (deterministic config) or manual fl All operations run in a single processing pass. +## Template Override Priority + +When using a template, your explicit params always win: + +1. **Your flags** — highest priority. `--prompt`, `--reference-image`, etc. override everything. +2. **Artist context** — if the artist has a style guide, it personalizes the template. +3. **Template defaults** — lowest priority. The recipe's built-in values. + +Example: Use the bedroom template but override the prompt: + +```bash +recoup content image --template artist-caption-bedroom \ + --prompt "A candid selfie in a sunlit kitchen, warm tones" --json +``` + +The template provides reference images and style rules, but your prompt replaces the template's default prompt. + ## Watching Your Work You can't see pixels — but `analyze-video` can. Use it freely throughout the workflow to evaluate your output. Write your own prompts based on what you need to know.