Skip to content

Add media file support for LLM prompter#580

Merged
sal-uva merged 13 commits intomasterfrom
copilot/add-media-support-to-llm-prompter
Mar 12, 2026
Merged

Add media file support for LLM prompter#580
sal-uva merged 13 commits intomasterfrom
copilot/add-media-support-to-llm-prompter

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 9, 2026

Extends llm-prompter to work with parent datasets that are media archives (zip files from image downloaders or media imports), not just text-based CSV/NDJSON datasets.

common/lib/llm.py

  • create_multimodal_content() now accepts media_files (local paths, base64-encoded) alongside existing media_urls
  • _format_media_block() — new helper for provider-specific content blocks:
    • Anthropic: image blocks for images, document blocks for video/audio
    • OpenAI: data URIs for images/video, input_audio format for audio
    • Google/others: data URI with image_url wrapper
  • generate_text() gains media_files parameter to pass local file paths

processors/machine_learning/llm_prompter.py

  • is_compatible_with() — accepts zip datasets with media_type in (image, video, audio)
  • get_options() — when parent is a media archive:
    • Shows media info panel instead of column bracket instructions
    • Hides text-only options (column selection, batching, truncation, media URL toggle)
  • process() — new media archive code path: iterates zip contents, skips metadata files, base64-encodes each media file, sends to LLM via media_files param. Catches model incompatibility errors (e.g. non-vision model receiving images) with clear user-facing messages.
  • validate_query() — relaxes column bracket requirement for media archives; allows empty user prompt when system prompt is provided

All existing text-based processing behavior is preserved in the else branch. All models and custom model IDs remain available — incompatibility is caught at generation time rather than upfront.

Original prompt

This section details on the original issue you should resolve

<issue_title>Add media support for LLM prompter</issue_title>
<issue_description>Update llm-prompter.py so that it can work with parent datasets that are media files, e.g. as part of zip datasets generated through the download-images.py processor or as a media import.

  • This mostly works by checking the type of the parent dataset and changing the LLM prompter options and functionality accordingly.
  • If the parent dataset is a media file (e.g. png) or a zip with media items, llm-prompter.py is still available but option fields for text-based datasets (e.g. column selection) should not be shown. For instance, information on the column brackets is redundant here, since no 'column' data is inserted. Instead, an info UserInput field clearly indicates that media files are attached to the prompt. Also show other crucial information if needed.
  • llm.py should be updated accordingly to support video, image, and audio prompts.
  • Clear evaluation and alerts, both in the back-end and front-end (validate_query), should tell the user if there's any issues with the dataset media files.
  • In principle, the flexible setup of llm-prompter.py should be retained. All current models as well as custom model IDs should be allowed, and it should be caught whether the model can't process this data type on generating results after the fact (e.g. some don't support vision/audio) (unless LangChain has an easy way of checking this, then it can be called in validate_query).</issue_description>

Comments on the Issue (you are @copilot in this section)


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Copilot AI and others added 2 commits March 9, 2026 17:23
…ideo/audio

Co-authored-by: sal-uva <10960315+sal-uva@users.noreply.github.com>
…defaults, and error detection comments

Co-authored-by: sal-uva <10960315+sal-uva@users.noreply.github.com>
Copilot AI changed the title [WIP] Add media support for LLM prompter Add media file support for LLM prompter Mar 9, 2026
@sal-uva
Copy link
Copy Markdown
Member

sal-uva commented Mar 9, 2026

This should be mergeable. Not all vendors support video and audio, but that's acceptable in my opinion.

LLM prompter is now quite bulky and should be refactored, but maybe that's for another day.

@sal-uva sal-uva marked this pull request as ready for review March 11, 2026 10:29
Copilot AI review requested due to automatic review settings March 11, 2026 10:29
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the llm-prompter processor to support media-archive parent datasets (ZIPs containing image/video/audio), enabling multimodal prompting using locally extracted media files in addition to existing URL-based media inputs.

Changes:

  • Add ZIP media-archive compatibility and a dedicated processing path in LLMPrompter (options/UI + iteration + annotation mapping).
  • Extend LLMAdapter multimodal support to accept local media file paths (base64-encoded) alongside media URLs, with provider-specific formatting.
  • Minor UI/UX tweaks for annotation rendering and link wrapping, plus marking AudioExtractor ZIP outputs as media_type="audio".

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
common/lib/llm.py Adds local media_files support for multimodal prompts and provider-specific content block formatting.
processors/machine_learning/llm_prompter.py Adds media-archive dataset compatibility, media-specific options, and a ZIP iteration + LLM prompting path.
processors/audio/audio_extractor.py Marks resulting ZIP datasets as audio media type.
common/assets/llms.json Updates/renames several predefined model IDs and model card links.
webtool/templates/explorer/item-annotations.html Uses item_id consistently in DOM ids/classes and avoids variable shadowing.
webtool/static/css/explorer-annotation-generic.css Improves wrapping behavior for long annotation label/link text.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sal-uva sal-uva merged commit 388687b into master Mar 12, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add media support for LLM prompter

3 participants