Skip to content

fix: added LoRA Fine-Tuning Script and moved from hot to cold swapping in llama.cpp.#10

Open
shknth wants to merge 1 commit intodevfrom
dev-BRAIN-1
Open

fix: added LoRA Fine-Tuning Script and moved from hot to cold swapping in llama.cpp.#10
shknth wants to merge 1 commit intodevfrom
dev-BRAIN-1

Conversation

@shknth
Copy link
Owner

@shknth shknth commented Mar 2, 2026

Pull Request Checklist

Type:

  • Bug fix
  • New Feature
  • Refactor
  • Documentation

For Backend (Dev R1, R2, M):

  • I have used the Contracts (Interfaces) in domain/, not direct imports.
  • I have run scripts/setup.py to ensure dependencies are correct.
  • I have tested the API locally at localhost:45500.
  • If I changed the DB, I updated infrastructure/persistence/db.py.

For Frontend (Dev S):

  • I have tested with the Mock API running.
  • I have checked the UI in both Light and Dark modes.

Validation:

  • My code runs locally without crashing.
  • I have added/updated tests if applicable.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an automated LoRA fine-tuning pipeline (trainer + orchestration script + registry + scheduler) and updates the llama-cpp inference engine to “cold-swap” model instances when switching between base and LoRA-adapted inference (mirror mode).

Changes:

  • Add a LoRA training implementation (LoraTrainer) plus a standalone scripts/train_lora.py orchestrator for syncing examples, training, GGUF conversion, validation, and promotion.
  • Add an adapter version registry (AdapterRegistry) backed by SQLite model_versions, and update inference to load the promoted GGUF adapter via cold swap.
  • Add a time-triggered scheduler (TrainingScheduler) and an internal endpoint to manually trigger training.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
backend/src/idio/main.py Starts/stops the training scheduler during FastAPI lifespan and replaces prints with logging.
backend/src/idio/infrastructure/scheduler/training_scheduler.py New APScheduler-based subprocess spawner for periodic training runs.
backend/src/idio/infrastructure/config/settings.py Adds training scheduler toggles/intervals and LoRA hyperparameters.
backend/src/idio/infrastructure/ai/lora_trainer.py New LoRA fine-tuning + GGUF conversion + validation utilities.
backend/src/idio/infrastructure/ai/llm_engine.py Switches mirror mode to cold-swap model instances with optional lora_path.
backend/src/idio/infrastructure/ai/adapter_registry.py New SQLite-backed registry for registering/promoting active adapters.
backend/src/idio/api/routes/training.py Adds /trigger-training internal endpoint to spawn training immediately.
backend/scripts/train_lora.py New CLI training pipeline script that pulls data from internal endpoints and trains/promotes adapters.
backend/pyproject.toml Adds a [training] extras group for training/scheduler dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +26 to +31
from apscheduler.schedulers.background import BackgroundScheduler
from apscheduler.triggers.interval import IntervalTrigger
from loguru import logger

from idio.infrastructure.config.settings import settings

Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apscheduler is imported at module load time, but it’s only declared in the optional [training] extras. This will crash the backend on startup (and during tests that import idio.main) even when training_enabled is false. Consider moving the APScheduler imports into start() (or guarding them behind training_enabled) so inference-only installs don’t require training deps.

Suggested change
from apscheduler.schedulers.background import BackgroundScheduler
from apscheduler.triggers.interval import IntervalTrigger
from loguru import logger
from idio.infrastructure.config.settings import settings
from loguru import logger
from idio.infrastructure.config.settings import settings
if settings.training_enabled:
from apscheduler.schedulers.background import BackgroundScheduler
from apscheduler.triggers.interval import IntervalTrigger
else:
BackgroundScheduler = None # type: ignore[assignment]
IntervalTrigger = None # type: ignore[assignment]

Copilot uses AI. Check for mistakes.
Comment on lines +123 to +155
def _run_training_subprocess(self) -> None:
"""Spawn ``scripts/train_lora.py`` as a non-blocking subprocess.

Stdout and stderr stream to the parent process's log. The process
is run in the background so the scheduler job completes immediately
and does not block the APScheduler thread pool.
"""
if not _SCRIPT_PATH.exists():
logger.error(
"Training script not found at {}. Cannot run training.", _SCRIPT_PATH
)
return

cmd = [
sys.executable,
str(_SCRIPT_PATH),
"--api-url", self._api_url,
"--api-key", self._api_key,
"--min-examples", str(settings.training_min_examples),
"--batch-size", str(settings.training_sync_batch_size),
]

logger.info("Spawning training subprocess: {}.", " ".join(cmd))

try:
# Popen (non-blocking): scheduler job returns immediately.
# Stdout/stderr inherit the parent process streams.
subprocess.Popen(
cmd,
stdout=None,
stderr=None,
cwd=str(_SCRIPT_PATH.parent.parent), # backend/
)
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the job uses subprocess.Popen(...) and returns immediately, APScheduler will consider the job complete while the training process is still running. That means subsequent intervals (or manual triggers) can start overlapping training runs, which is likely to exhaust CPU/GPU and corrupt output artifacts. Consider adding a lock/PID check around spawning (or track the child process handle) to ensure only one training subprocess runs at a time.

Copilot uses AI. Check for mistakes.
Comment on lines +127 to +143
# Cold swap: destroy the existing instance to free RAM/VRAM before
# loading the new one, avoiding a brief period of double occupancy.
if _model_instance is not None:
_memory_tracker.capture_baseline()
logger.info(
"Cold-swapping adapter: {} → {}.",
_current_lora_path or "base",
lora_path or "base",
)
del _model_instance
_model_instance = None
gc.collect()
_memory_tracker.capture_after()
_memory_tracker.log_spike("Cold-swap unload")

print(f"[llm_engine] Loading model from {model_path}...")
print(f"[llm_engine] Model size: {model_path.stat().st_size / (1024*1024):.1f} MB")

_model_instance = Llama(
model_path=str(model_path),
n_ctx=512,
n_threads=4,
n_gpu_layers=0,
verbose=False
)
print("[llm_engine] Model loaded successfully!")
_model_instance = _load_model(lora_path)
_current_lora_path = lora_path
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MemoryTracker.capture_after() is called before the new model is loaded (it runs right after deleting the old instance). This makes the reported “spike” reflect unload-only (often negative) rather than the memory impact of the cold-swap/load, which defeats the purpose of tracking adapter swap memory usage. Consider capturing baseline before unload and capturing after after _load_model(...) completes (or tracking unload and load separately with distinct measurements).

Copilot uses AI. Check for mistakes.
Comment on lines +38 to +46
# Scheduler is imported lazily to avoid circular imports.
_scheduler = None

def _get_scheduler():
global _scheduler
if _scheduler is None:
from idio.infrastructure.scheduler.training_scheduler import TrainingScheduler
_scheduler = TrainingScheduler()
return _scheduler
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_scheduler() will raise at runtime if training dependencies (notably apscheduler) aren’t installed, because it imports TrainingScheduler without handling ImportError. Since this route is part of the running backend, consider catching ImportError here and returning a clear 503/500 error telling the operator to install .[training] (or otherwise ensure the scheduler’s imports are gated behind training_enabled).

Copilot uses AI. Check for mistakes.
api_key: Optional[str] = None,
) -> None:
self._interval_hours = interval_hours or settings.training_interval_hours
self._api_url = api_url or f"http://{settings.host}:{settings.port}"
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaulting api_url to http://{settings.host}:{settings.port} can break when the server binds to 0.0.0.0 (valid bind address but not a connect target). Consider defaulting the client URL to http://127.0.0.1:{settings.port} (or introducing a dedicated settings.internal_api_url) so the scheduler always connects reliably regardless of bind host.

Suggested change
self._api_url = api_url or f"http://{settings.host}:{settings.port}"
self._api_url = api_url or f"http://127.0.0.1:{settings.port}"

Copilot uses AI. Check for mistakes.
Comment on lines +193 to +203
lora_path = _resolve_active_adapter() if use_adapter else None

if use_adapter and lora_path is None:
logger.warning(
"Mirror mode requested but no trained adapter found. "
"Falling back to base model. Run training to generate an adapter."
)

# get_model handles the actual cold-swap and memory tracking internally.
get_model(lora_path=lora_path)

Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_swap_adapter() always calls _resolve_active_adapter() in mirror mode, which instantiates AdapterRegistry and hits SQLite on every mirror request, even when the correct adapter is already loaded and get_model(...) would take the fast path. This adds avoidable latency to the hot path. Consider caching the resolved active adapter path in-memory and only re-resolving when a swap is actually needed (e.g., when _current_lora_path differs, or when a reload signal indicates a new promoted version).

Copilot uses AI. Check for mistakes.
Comment on lines +150 to +165
"""Write a signal file so the server knows to hot-swap the adapter.

The server's health-check or a background watcher can poll this file.
Using a file avoids introducing an additional HTTP endpoint just for reloading.

Args:
adapter_path: Path to the newly active GGUF adapter.
"""
_RELOAD_SIGNAL_FILE.parent.mkdir(parents=True, exist_ok=True)
_RELOAD_SIGNAL_FILE.write_text(
json.dumps({
"adapter_path": str(adapter_path),
"timestamp": time.time(),
})
)
logger.info("Reload signal written to {}.", _RELOAD_SIGNAL_FILE)
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_write_reload_signal() writes storage/.adapter_reload, but there’s no corresponding reader/watcher in backend/src (no code references .adapter_reload). As-is, this step won’t cause the server to reload/cold-swap the adapter and may mislead operators. Either implement the polling/watcher mechanism (or an explicit internal reload endpoint) or remove this signal file step to avoid dead/unused behavior.

Suggested change
"""Write a signal file so the server knows to hot-swap the adapter.
The server's health-check or a background watcher can poll this file.
Using a file avoids introducing an additional HTTP endpoint just for reloading.
Args:
adapter_path: Path to the newly active GGUF adapter.
"""
_RELOAD_SIGNAL_FILE.parent.mkdir(parents=True, exist_ok=True)
_RELOAD_SIGNAL_FILE.write_text(
json.dumps({
"adapter_path": str(adapter_path),
"timestamp": time.time(),
})
)
logger.info("Reload signal written to {}.", _RELOAD_SIGNAL_FILE)
"""Placeholder for signaling the server to hot-swap the adapter.
NOTE: As of now, there is no watcher/endpoint in this backend that consumes
a `.adapter_reload` file. This function is intentionally a no-op to avoid
implying a reload mechanism that does not exist yet.
Args:
adapter_path: Path to the newly active GGUF adapter.
"""
# Intentionally do not write any reload signal file until a corresponding
# watcher/endpoint is implemented on the server side.
logger.warning(
"Adapter reload requested for %s, but no reload mechanism is currently "
"wired up in the backend; skipping reload signal emission.",
adapter_path,
)

Copilot uses AI. Check for mistakes.
# Maximum examples returned in a single sync-training batch.
training_sync_batch_size: int = 100

# Minimum unsynced examples required before the scheduler fires training.
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

training_min_examples is described as “required before the scheduler fires training”, but the scheduler currently spawns the subprocess purely on an interval; the script enforces the minimum-unsynced threshold. Consider updating the comment to reflect where the check actually happens (or add a pre-check in the scheduler before spawning).

Suggested change
# Minimum unsynced examples required before the scheduler fires training.
# Minimum unsynced examples the training script requires before running;
# the scheduler still triggers based solely on ``training_interval_hours``.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants