Skip to content

alpha-hack-program/llama-stack-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLAMA STACK DEMO

Welcome to this Llama Stack Demo you can run easily on Red Hat OpenShift AI (RHOAI). It's a comprehensive demo that deploys everything you need to test agents with MCP tools on RHOAI using Llama Stack:

  • Models: IBM Granite 3.3 and Llama 3.1 deployed using vLLM Serving Runtime and automatically added to Llama Stack
  • MCP Servers: deployed as normal deployments and automatically added to Llama Stack
  • Llama Stack Server: Llama Stack Server is deployed using the Llama Stack Operator included in Red Hat OpenShift AI

Trip Report

I started with an MCP Server using TypeScript generated using Cursor.ai straight from legal documents. It tried to cover all the legal elements, it was complex and the logic tricky, several iterations to fine tune the descriptions to make it kind of work in a small model (SLM) like IBM Granite and Llama 3.2 (~7B both of them). I changed my strategy and also the programing language, from TypeScript to Rust and from the full logic generated from the legal documents to a decision table generated with Cursor.ai but with much more guidance (and simplification) from my side. This time it worked pretty well with Claude Desktop but not quite with the SLMs, the latest iteration had to do with the names of some variables, is_single_parent_family and number_of_children_after...

The inner to outer loop:

  • decision table
  • code
  • unit testing
  • MCP Inspector
  • Claude Desktop
  • Llama Stack Local with models in RHOAI
  • Llama Stack on RHOAI

Detailed description

Eligibility Assessment System powered by Llama Stack and Model Context Protocol (MCP)! This system helps assess eligibility for Family Care Unpaid Leave Support based on the Republic of Lysmark's Act No. 2025/47-SA. To get started quickly, jump straight to installation.

The Eligibility Assessment MCP Llama Stack system is an intelligent solution for evaluating eligibility for Family Care Unpaid Leave Support based on the Republic of Lysmark's legislation (Act No. 2025/47-SA and related regulations). The system combines the power of Llama Stack with Model Context Protocol (MCP) servers and Retrieval Augmented Generation (RAG) to provide accurate, context-aware assessments.

This system is designed to help users understand their eligibility for unpaid leave assistance in various family care situations, including care for sick/injured family members, childcare for multiple children, adoption cases, and single-parent family scenarios.

This deployment includes a Helm chart for setting up:

  • An OpenShift AI Project with all necessary components
  • Llama Stack distribution with Llama 3.1 8B model for natural language processing
  • MCP server for the eligibility engine with specialized knowledge processing
  • Vector database (Milvus) with embedded legal documents for RAG capabilities
  • Document loader service to populate the knowledge base with Act No. 2025/47-SA and regulations
  • Llama Stack Playground interface for interactive eligibility consultations

Use this project to quickly deploy an intelligent eligibility assessment system that provides accurate, legally-informed guidance on Family Care Unpaid Leave Support eligibility. 🏛️

See it in action

Experience the Eligibility Assessment System through the Llama Stack Playground interface. After deployment, you can interact with the system to:

  • Ask questions about eligibility requirements for different family care scenarios
  • Get detailed assessments based on the legal framework of Act No. 2025/47-SA
  • Understand the documentation needed for applications
  • Learn about financial assistance amounts and duration limits

Architecture diagrams

architecture.png

References

Requirements

Recommended hardware requirements

  • GPU: 1x NVIDIA A10G or equivalent (for optimal LLM performance)
  • CPU: 8 cores
  • Memory: 24 Gi (to handle Llama Stack, MCP servers, and vector database)
  • Storage: 20Gi (for models, vector database, and document storage)

Note: The system uses quantized Llama 3.1 8B model (w4a16) for efficient GPU utilization while maintaining good performance.

Minimum hardware requirements

  • GPU: 1x NVIDIA A10G-SHARED (shared GPU allocation)
  • CPU: 6 cores
  • Memory: 16 Gi
  • Storage: 10Gi

Required software

  • Red Hat OpenShift 4.16+
  • Red Hat OpenShift AI 2.16+
  • Dependencies for Single-model server:
    • Red Hat OpenShift Service Mesh
    • Red Hat OpenShift Serverless
  • Llama Stack Components:
    • Llama Stack Distribution with vLLM runtime
    • Model Context Protocol (MCP) server support
    • Vector database capabilities (Milvus)
  • Container Images:
    • Llama Stack: quay.io/opendatahub/llama-stack:odh
    • vLLM Runtime: quay.io/modh/vllm:rhoai-2.23-cuda
    • Eligibility Engine MCP: quay.io/atarazana/eligibility-engine-mcp-rs:latest

Required permissions

  • Standard user. No elevated cluster permissions required

Install

Please note before you start

This system was tested on Red Hat OpenShift 4.16.24 & Red Hat OpenShift AI v2.16.2.
Ensure you have access to GPU resources and the required container registries.

Clone

git clone https://github.com/alpha-hack-program/eligibility-mcp-llamastack.git && \
    cd eligibility-mcp-llamastack/

Log in OpenShift

oc login ...

Create the project

PROJECT="llama-stack-demo"

oc new-project ${PROJECT}

Label project with

  • modelmesh-enabled: 'false'
  • opendatahub.io/dashboard: 'true'
oc label namespace ${PROJECT} modelmesh-enabled=false opendatahub.io/dashboard=true

Install with Helm

This default delployment deploys one model... TODO.

helm install llama-stack-demo helm/ --namespace ${PROJECT} \
  --set namespace=${PROJECT} --timeout 10m

If you have access to Intel Gaudi accelerators you could use this command which uses helm/intel.values instead:

helm install llama-stack-demo helm/ --namespace ${PROJECT} \
  --values helm/intel.yaml --set namespace=${PROJECT} --timeout 10m

If you want an NVIDIA deployment with two models run this. TODO explain which models... bla.

helm install llama-stack-demo helm/ --namespace ${PROJECT} \
  --values helm/nvidia.yaml --set namespace=${PROJECT} --timeout 10m

Wait for pods

oc -n ${PROJECT} get pods -w

Expected pods (may take 5-10 minutes to start):

(Output)
NAME                                              READY   STATUS    RESTARTS   AGE
eligibility-lsd-0                                1/1     Running   0          8m
eligibility-lsd-playground-0                     1/1     Running   0          8m
eligibility-engine-0                             1/1     Running   0          7m
loader-0                                         0/1     Completed 0          6m
llama-3-1-8b-w4a16-predictor-df76b56d6-fw8fp    2/2     Running   0          10m

Test

You can access the system in multiple ways:

Option 1: OpenShift AI Dashboard

Get the OpenShift AI Dashboard URL:

oc get routes rhods-dashboard -n redhat-ods-applications

Navigate to Data Science Projects -> llama-stack-demo. You'll see the deployed models and workbenches.

Option 2: Direct Access to Llama Stack Playground

Get the Llama Stack Playground URL:

oc get routes eligibility-lsd-playground -n ${PROJECT}

Access the playground directly to interact with the eligibility assessment system.

Option 3: API Access

For programmatic access, get the Llama Stack API endpoint:

oc get routes eligibility-lsd -n ${PROJECT}

Use this endpoint to integrate the eligibility assessment capabilities into your applications.

Business Rules

Unpaid Leave Evaluation Data

Family relationship Situation Single-parent family Number of children Potentially eligible Monthly benefit Case Description Output DESCRIPTION Rule ID
true delivery, birth true true 500 E Single-parent family with newborn The single-parent status must be documented Case E: Single-parent family with any child regla-005
true delivery, birth >=3 true 500 B Third child or more with newborn The number of children must be 3 or more, the ages of at least 2 of the minors must be less than 6, if there is disability greater than 33% then the limit is 9 years Case B: Third child or more with newborn regla-002
true delivery, birth false 0 B The number of children must be 3 or more, must consult with administration The number of children must be 3 or more, must consult with administration 9ec43eb2-484f-4fcf-9dd7-6510da30850c
true illness, accident true 725 A First-degree family care sick or accident victim The person must have been hospitalized and the care of the person must be continued Case A: First-degree family care sick/injured regla-001
true adoption, foster_care true 500 C Adoption or foster care In the foster care case the duration must be longer than one year Case C: Adoption or foster care regla-003
true multiple_birth, multiple_delivery, multiple_adoption, multiple_foster_care true 500 D Delivery, adoption or foster care multiple Case D: Delivery, adoption or foster care multiple regla-004
true false 0 NONE No case applies No case applies 515afd1f-43cc-44ed-971c-fefb273840b2
false false 0 NONE Not applicable by relationship (first degree) Only father, mother, son, daughter, spouse or partner are accepted 058dd988-90dd-46da-8478-ee458aacde6f
false 0 NONE UNKNOWN_ERROR f32bfb0f-801d-4d6c-b5bd-13a1edd0eaca

Summary

This table contains the evaluation criteria and outcomes for unpaid leave assistance eligibility. The data shows different cases (A through E) with varying monthly benefits:

  • Case A: First-degree family care sick/injured - 725€
  • Case B: Third child or more with newborn - 500€
  • Case C: Adoption or foster care - 500€
  • Case D: Multiple delivery/adoption/foster care - 500€
  • Case E: Single-parent family with any child - 500€
  • NONE: Cases where no assistance applies - 0€

The table includes input parameters (family relationship, situation, single-parent status, number of children) and corresponding outputs (eligibility, benefit amount, case classification, descriptions, and rule IDs).

Example queries

  • My mother had an accident and she's at the hospital. I have to take care of her, can I get access to the unpaid leave aid?
  • My mother had an accident and she's at the hospital. I have to take care of her, tell me if I can get access to the unpaid leave aid and the requirements I have to meet.
  • I have just adopted two children, at the same time, aged 3 and 5, am I elegible for the unpaid leave aid? How much?
  • I have just adopted two children, at the same time, aged 3 and 5, tell me if I'm elegible for the unpaid leave aid and which requirements I should meet.
  • I'm a single mom and I just had a baby, may I get access to the unpaid leave aid?
  • Enumerate the legal requirements to get the aid for unpaid leave.

Example System Prompt

You are a helpful AI assistant that uses tools to help citizens of the Republic of Lysmark. Answers should be concise and human readable. AVOID references to tools or function calling nor show any JSON. Infer parameters for function calls or instead use default values or request the needed information from the user. Call the RAG tool first if unsure. Parameter single_parent_family only is necessary if birth/adoption/foster_care otherwise use false.

Uninstall

Unistall the helm chart.

helm uninstall llama-stack-demo --namespace ${PROJECT}

Delete all remaining objects like jobs created in hooks.

oc delete jobs -l "app.kubernetes.io/part-of=eligibility-mcp-llamastack"

Finally remove the project:

oc delete project ${PROJECT}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published