Skip to content

llama-stack-0.2.23 and adjust to new APIs#148

Closed
mkristian wants to merge 2 commits intocontainers:mainfrom
mkristian:upgrade-ramalama-and-llamastack
Closed

llama-stack-0.2.23 and adjust to new APIs#148
mkristian wants to merge 2 commits intocontainers:mainfrom
mkristian:upgrade-ramalama-and-llamastack

Conversation

@mkristian
Copy link
Copy Markdown

@mkristian mkristian commented Feb 15, 2026

update to

  • llama-stack==0.2.23
  • ramalama==0.17.1

On the way adjusted the code to use the new APIs rom llama-stack. Fix the ramalama-run.yml to start up the container which is build here https://github.com/containers/ramalama/blob/main/container-images/llama-stack/Containerfile

In turns out these 'downloads' hacks are not needed anymore when the CMD gets adjusted.

Summary by Sourcery

Update ramalama-stack to use newer llama-stack and ramalama releases and align configuration and provider wiring with the updated APIs.

New Features:

  • Add a files API provider backed by local filesystem storage and sqlite metadata in the ramalama stack distribution.

Bug Fixes:

  • Fix the ramalama-run configuration so the remote ramalama inference provider correctly references its Python module and uses proper storage paths for Milvus and registry databases.

Enhancements:

  • Simplify the exposed model registry to only include explicit fp16/instruct variants and the new llama3.3:70b model entry.
  • Refactor the ramalama provider specification to use the new RemoteProviderSpec schema and provider_type field from llama-stack.
  • Remove legacy external provider directory handling now that provider discovery is managed by the new llama-stack APIs.
  • Update dependency versions and add required packages (including llama-stack-api, newer FastAPI/Starlette/OpenAI/mcp, and database/OCI-related libraries) to match the new llama-stack and ramalama versions.
  • Constrain setuptools in pyproject.toml to maintain compatibility with milvus-lite and declare milvus-lite as an explicit dependency.

Signed-off-by: Christian Meier <m.kristian@web.de>
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai bot commented Feb 15, 2026

Reviewer's Guide

Updates this distribution to llama-stack 0.2.23 / ramalama 0.17.1 and aligns configs and provider wiring with the new llama-stack APIs, including refreshed dependencies, updated provider spec types, reworked ramalama-run.yaml, and removal of legacy provider discovery and model aliases.

File-Level Changes

Change Details Files
Align dependencies with newer llama-stack and ramalama versions and their transitive requirements.
  • Bump llama-stack from 0.2.14 to 0.5.0 and replace llama-stack-client with llama-stack-api 0.5.0.
  • Upgrade ramalama from 0.10.1 to 0.17.1 and update related packages such as fastapi, starlette, openai, numpy, mcp, typing-extensions, and urllib3.
  • Add new runtime dependencies required by updated llama-stack (e.g., oci, oracledb, pyjwt, cryptography stack, tornado, psycopg2-binary) and adjust markers/usages in comments.
  • Remove obsolete auth-related deps (python-jose, rsa, ecdsa, pyasn1) and llama-stack-client-specific references.
  • Sync pyproject.toml dependency pins with the new versions and add milvus-lite plus a setuptools upper bound for compatibility.
requirements.txt
pyproject.toml
uv.lock
Adapt the provider implementation to the new llama-stack provider API.
  • Replace use of remote_provider_spec helper with direct construction of RemoteProviderSpec.
  • Switch to the new RemoteProviderSpec signature using provider_type and adapter_type fields while keeping the existing config_class and module wiring.
  • Drop AdapterSpec import/usage in favor of the simplified provider config model.
src/ramalama_stack/provider.py
Update ramalama-run.yaml to match new llama-stack distribution configuration expectations.
  • Add the files API and configure a localfs-based files provider backed by sqlite metadata storage.
  • Adjust the remote inference provider to include the module field and explicitly use provider_type remote::ramalama.
  • Switch post_training provider to inline::huggingface-gpu, add DPO output dir env override, and tweak db_path env interpolations for milvus paths.
  • Remove external_providers_dir from the server config now that remote providers.d registration is no longer used.
src/ramalama_stack/ramalama-run.yaml
Simplify model registry entries and remove non-instruct/alias variants that are no longer needed.
  • Remove build_model_entry-based aliases for non-fp16 or generic llama3.x model ids, keeping only the build_hf_repo_model_entry instruct variants.
  • Drop the now-unused build_model_entry import from the model registry module.
src/ramalama_stack/models.py
Remove legacy providers.d installation flow in setup and inline ramalama remote provider config file.
  • Delete copying of providers.d into ~/.llama/providers.d from the custom install command in setup.py, relying instead on the distribution YAML wiring.
  • Remove the dedicated remote inference provider YAML file under providers.d, as configuration is now centralized in ramalama-run.yaml.
setup.py
src/ramalama_stack/providers.d/remote/inference/ramalama.yaml

Possibly linked issues

  • #: PR updates ramalama-run.yaml’s env-var syntax and config to match newer llama-stack behavior, satisfying the issue.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @mkristian, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on upgrading core dependencies, llama-stack and ramalama, to their latest versions. This upgrade necessitated significant adjustments to the codebase to accommodate new API structures and remove outdated installation practices. The changes ensure compatibility with the updated libraries and streamline the project's configuration and deployment processes.

Highlights

  • Dependency Updates: Updated llama-stack to version 0.2.23 and ramalama to 0.17.1. Several other dependencies in requirements.txt were also updated to newer versions, and milvus-lite was added as a new dependency.
  • API Adjustments: Adapted the code to align with new APIs introduced in llama-stack, specifically in how provider specifications and model entries are defined.
  • Simplified Installation: Removed the 'downloads hacks' from setup.py, which previously copied provider definitions and run configurations, as these are no longer needed with the updated llama-stack and ramalama versions.
  • Ramalama Configuration Updates: Modified ramalama-run.yaml to include the files API, update the huggingface provider to huggingface-gpu, and remove the external_providers_dir configuration.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • pyproject.toml
    • Updated llama-stack from 0.2.14 to 0.2.23.
    • Updated ramalama from 0.10.1 to 0.17.1.
    • Added milvus-lite>=2.5.1 as a new dependency.
    • Added setuptools<70 to address milvus-lite compatibility.
  • requirements.txt
    • Updated aiohttp from 3.12.7 to 3.13.3.
    • Updated aiosignal from 1.3.2 to 1.4.0.
    • Added annotated-doc==0.0.4.
    • Removed llama-stack-client references from various dependency explanations.
    • Added oci==2.167.1, cffi==2.0.0, circuitbreaker==2.1.3, cryptography==45.0.7, oracledb==3.4.2, psycopg2-binary==2.9.11, pyjwt==2.11.0, pyopenssl==25.1.0, pywin32==311, tornado==6.5.4.
    • Updated fastapi from 0.115.12 to 0.129.0.
    • Updated mcp from 1.9.2 to 1.26.0.
    • Updated numpy from 2.2.6 to 2.4.2.
    • Updated openai from 1.84.0 to 2.21.0.
    • Updated pydantic from 2.11.5 to 2.12.5.
    • Updated pydantic-core from 2.33.2 to 2.41.5.
    • Removed pyaml==25.5.0, pyasn1==0.6.1, python-jose==3.5.0, rsa==4.9.1.
    • Updated starlette from 0.46.2 to 0.52.1.
    • Updated typing-extensions from 4.14.0 to 4.15.0.
    • Updated typing-inspection from 0.4.1 to 0.4.2.
    • Updated urllib3 from 2.4.0 to 2.6.3.
  • setup.py
    • Removed the logic for copying providers.d and ramalama-run.yaml during installation.
  • src/ramalama_stack/models.py
    • Removed calls to build_model_entry, standardizing model entry creation to only use build_hf_repo_model_entry.
  • src/ramalama_stack/provider.py
    • Updated get_provider_spec to use RemoteProviderSpec instead of remote_provider_spec.
    • Added provider_type and module fields to the provider specification.
    • Removed the AdapterSpec wrapper, integrating its fields directly into RemoteProviderSpec.
  • src/ramalama_stack/providers.d/remote/inference/ramalama.yaml
    • Removed the file, as its content is now integrated into src/ramalama_stack/provider.py.
  • src/ramalama_stack/ramalama-run.yaml
    • Added the files API to the list of enabled APIs.
    • Configured a meta-reference-files provider using inline::localfs with a specified storage_dir and metadata_store.
    • Added a module: ramalama_stack entry to the ramalama inference provider configuration.
    • Changed the huggingface post-training provider type to inline::huggingface-gpu.
    • Added dpo_output_dir configuration to the huggingface-gpu provider.
    • Removed the external_providers_dir configuration from the server section.
Activity
  • No specific activity (comments, reviews, progress) was detected in the provided context for this pull request.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The dependency versions between requirements.txt and pyproject.toml are inconsistent for key packages (e.g., llama-stack is pinned to 0.5.0 in requirements.txt but 0.2.23 in pyproject.toml, setuptools is <70 vs 80.9.0), which can lead to hard-to-debug environment issues; consider aligning these pins or clearly separating dev/runtime constraints.
  • Since the inline remote provider definition in providers.d was removed in favor of the new RemoteProviderSpec and ramalama-run.yaml config, it would be good to double-check that no tooling still relies on ~/.llama/providers.d, or add a migration note/compat shim if needed.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The dependency versions between requirements.txt and pyproject.toml are inconsistent for key packages (e.g., llama-stack is pinned to 0.5.0 in requirements.txt but 0.2.23 in pyproject.toml, setuptools is <70 vs 80.9.0), which can lead to hard-to-debug environment issues; consider aligning these pins or clearly separating dev/runtime constraints.
- Since the inline remote provider definition in providers.d was removed in favor of the new RemoteProviderSpec and ramalama-run.yaml config, it would be good to double-check that no tooling still relies on ~/.llama/providers.d, or add a migration note/compat shim if needed.

## Individual Comments

### Comment 1
<location> `src/ramalama_stack/ramalama-run.yaml:68-74` </location>
<code_context>
     config: {}
   post_training:
   - provider_id: huggingface
-    provider_type: inline::huggingface
+    provider_type: inline::huggingface-gpu
     config:
       checkpoint_format: huggingface
       distributed_backend: null
       device: cpu
+      dpo_output_dir: ${env.DPO_OUTPUT_DIR:=~/.llama/distributions/ramalama/dpo_output}
   safety:
</code_context>

<issue_to_address>
**issue (bug_risk):** Using `inline::huggingface-gpu` while explicitly setting `device: cpu` is confusing and may be misconfigured.

Here we’re mixing a `*-gpu` provider with `device: cpu`. Please either set `device` to a GPU value (or rely on the provider default) if you want GPU execution, or revert to a non-GPU provider type if you intend to stay on CPU, to avoid configuration drift and debugging confusion.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates llama-stack and ramalama dependencies and adapts the codebase to their new APIs. The changes are generally good, including dependency updates, API adoption, and removal of legacy code. However, I've identified a few issues that need attention. There's a critical version mismatch for the llama-stack dependency between pyproject.toml and requirements.txt. Also, setup.py contains a bug in an exception handler that will cause a NameError. Finally, the ramalama-run.yaml configuration for the post_training provider seems contradictory. I've added detailed comments on these points.

I am having trouble creating individual review comments. Click here to see my feedback.

pyproject.toml (24)

critical

There's a version mismatch for llama-stack. This file specifies version 0.2.23, which matches the PR title. However, the requirements.txt file lists llama-stack==0.5.0. This inconsistency can lead to unexpected dependency resolutions and should be corrected to ensure the correct version is used.

setup.py (20-22)

critical

The logic from this except block seems to have been reused for the ramalama-run.yaml copy operation, but the variables were not updated. The new except block (on line 21 of the final file) still references providers_dir and target_dir_1, which are no longer defined, causing a NameError. The error message should reference run_yaml and target_dir_2.

For example:

except Exception as error:
    print(f"Failed to copy {run_yaml} to {target_dir_2}. Error: {error}")
    raise

src/ramalama_stack/ramalama-run.yaml (69-75)

high

There's a configuration mismatch here. The provider_type is set to inline::huggingface-gpu, but the device is configured as cpu. This is contradictory. If a GPU is intended, the device should be configured accordingly (e.g., cuda). If only CPU is to be used, a non-GPU provider_type should be used.

@mkristian mkristian force-pushed the upgrade-ramalama-and-llamastack branch 2 times, most recently from 944e6b0 to b272d3f Compare February 15, 2026 15:20
Signed-off-by: Christian Meier <m.kristian@web.de>
@mkristian mkristian force-pushed the upgrade-ramalama-and-llamastack branch from b272d3f to 1222ad9 Compare March 3, 2026 17:01
@mkristian
Copy link
Copy Markdown
Author

closing as superseeded by #149

@mkristian mkristian closed this Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant