fix: added LoRA Fine-Tuning Script and moved from hot to cold swapping in llama.cpp. by shknth · Pull Request #10 · shknth/idio

shknth · 2026-03-02T14:22:31Z

Pull Request Checklist

Type:

Bug fix
New Feature
Refactor
Documentation

For Backend (Dev R1, R2, M):

I have used the Contracts (Interfaces) in domain/, not direct imports.
I have run scripts/setup.py to ensure dependencies are correct.
I have tested the API locally at localhost:45500.
If I changed the DB, I updated infrastructure/persistence/db.py.

For Frontend (Dev S):

I have tested with the Mock API running.
I have checked the UI in both Light and Dark modes.

Validation:

My code runs locally without crashing.
I have added/updated tests if applicable.

…g in llama.cpp.

Copilot

Pull request overview

This PR introduces an automated LoRA fine-tuning pipeline (trainer + orchestration script + registry + scheduler) and updates the llama-cpp inference engine to “cold-swap” model instances when switching between base and LoRA-adapted inference (mirror mode).

Changes:

Add a LoRA training implementation (LoraTrainer) plus a standalone scripts/train_lora.py orchestrator for syncing examples, training, GGUF conversion, validation, and promotion.
Add an adapter version registry (AdapterRegistry) backed by SQLite model_versions, and update inference to load the promoted GGUF adapter via cold swap.
Add a time-triggered scheduler (TrainingScheduler) and an internal endpoint to manually trigger training.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 15 comments.

Show a summary per file

File	Description
backend/src/idio/main.py	Starts/stops the training scheduler during FastAPI lifespan and replaces prints with logging.
backend/src/idio/infrastructure/scheduler/training_scheduler.py	New APScheduler-based subprocess spawner for periodic training runs.
backend/src/idio/infrastructure/config/settings.py	Adds training scheduler toggles/intervals and LoRA hyperparameters.
backend/src/idio/infrastructure/ai/lora_trainer.py	New LoRA fine-tuning + GGUF conversion + validation utilities.
backend/src/idio/infrastructure/ai/llm_engine.py	Switches mirror mode to cold-swap model instances with optional `lora_path`.
backend/src/idio/infrastructure/ai/adapter_registry.py	New SQLite-backed registry for registering/promoting active adapters.
backend/src/idio/api/routes/training.py	Adds `/trigger-training` internal endpoint to spawn training immediately.
backend/scripts/train_lora.py	New CLI training pipeline script that pulls data from internal endpoints and trains/promotes adapters.
backend/pyproject.toml	Adds a `[training]` extras group for training/scheduler dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

backend/src/idio/infrastructure/ai/lora_trainer.py

backend/src/idio/main.py

Copilot · 2026-03-02T14:35:45Z

backend/src/idio/infrastructure/scheduler/training_scheduler.py

+from apscheduler.schedulers.background import BackgroundScheduler
+from apscheduler.triggers.interval import IntervalTrigger
+from loguru import logger
+
+from idio.infrastructure.config.settings import settings
+


apscheduler is imported at module load time, but it’s only declared in the optional [training] extras. This will crash the backend on startup (and during tests that import idio.main) even when training_enabled is false. Consider moving the APScheduler imports into start() (or guarding them behind training_enabled) so inference-only installs don’t require training deps.

Suggested change

from apscheduler.schedulers.background import BackgroundScheduler

from apscheduler.triggers.interval import IntervalTrigger

from loguru import logger

from idio.infrastructure.config.settings import settings

from loguru import logger

from idio.infrastructure.config.settings import settings

if settings.training_enabled:

from apscheduler.schedulers.background import BackgroundScheduler

from apscheduler.triggers.interval import IntervalTrigger

else:

BackgroundScheduler = None # type: ignore[assignment]

IntervalTrigger = None # type: ignore[assignment]

Copilot · 2026-03-02T14:35:45Z

backend/src/idio/infrastructure/scheduler/training_scheduler.py

+    def _run_training_subprocess(self) -> None:
+        """Spawn ``scripts/train_lora.py`` as a non-blocking subprocess.
+
+        Stdout and stderr stream to the parent process's log.  The process
+        is run in the background so the scheduler job completes immediately
+        and does not block the APScheduler thread pool.
+        """
+        if not _SCRIPT_PATH.exists():
+            logger.error(
+                "Training script not found at {}. Cannot run training.", _SCRIPT_PATH
+            )
+            return
+
+        cmd = [
+            sys.executable,
+            str(_SCRIPT_PATH),
+            "--api-url", self._api_url,
+            "--api-key", self._api_key,
+            "--min-examples", str(settings.training_min_examples),
+            "--batch-size", str(settings.training_sync_batch_size),
+        ]
+
+        logger.info("Spawning training subprocess: {}.", " ".join(cmd))
+
+        try:
+            # Popen (non-blocking): scheduler job returns immediately.
+            # Stdout/stderr inherit the parent process streams.
+            subprocess.Popen(
+                cmd,
+                stdout=None,
+                stderr=None,
+                cwd=str(_SCRIPT_PATH.parent.parent),  # backend/
+            )


Because the job uses subprocess.Popen(...) and returns immediately, APScheduler will consider the job complete while the training process is still running. That means subsequent intervals (or manual triggers) can start overlapping training runs, which is likely to exhaust CPU/GPU and corrupt output artifacts. Consider adding a lock/PID check around spawning (or track the child process handle) to ensure only one training subprocess runs at a time.

Copilot · 2026-03-02T14:35:45Z

backend/src/idio/infrastructure/ai/llm_engine.py

+        # Cold swap: destroy the existing instance to free RAM/VRAM before
+        # loading the new one, avoiding a brief period of double occupancy.
+        if _model_instance is not None:
+            _memory_tracker.capture_baseline()
+            logger.info(
+                "Cold-swapping adapter: {} → {}.",
+                _current_lora_path or "base",
+                lora_path or "base",
            )
+            del _model_instance
+            _model_instance = None
+            gc.collect()
+            _memory_tracker.capture_after()
+            _memory_tracker.log_spike("Cold-swap unload")

-        print(f"[llm_engine] Loading model from {model_path}...")
-        print(f"[llm_engine] Model size: {model_path.stat().st_size / (1024*1024):.1f} MB")
-
-        _model_instance = Llama(
-            model_path=str(model_path),
-            n_ctx=512,
-            n_threads=4,
-            n_gpu_layers=0,
-            verbose=False
-        )
-        print("[llm_engine] Model loaded successfully!")
+        _model_instance = _load_model(lora_path)
+        _current_lora_path = lora_path


MemoryTracker.capture_after() is called before the new model is loaded (it runs right after deleting the old instance). This makes the reported “spike” reflect unload-only (often negative) rather than the memory impact of the cold-swap/load, which defeats the purpose of tracking adapter swap memory usage. Consider capturing baseline before unload and capturing after after _load_model(...) completes (or tracking unload and load separately with distinct measurements).

Copilot · 2026-03-02T14:35:47Z

backend/src/idio/api/routes/training.py

+# Scheduler is imported lazily to avoid circular imports.
+_scheduler = None
+
+def _get_scheduler():
+    global _scheduler
+    if _scheduler is None:
+        from idio.infrastructure.scheduler.training_scheduler import TrainingScheduler
+        _scheduler = TrainingScheduler()
+    return _scheduler


_get_scheduler() will raise at runtime if training dependencies (notably apscheduler) aren’t installed, because it imports TrainingScheduler without handling ImportError. Since this route is part of the running backend, consider catching ImportError here and returning a clear 503/500 error telling the operator to install .[training] (or otherwise ensure the scheduler’s imports are gated behind training_enabled).

Copilot · 2026-03-02T14:35:47Z

backend/src/idio/infrastructure/scheduler/training_scheduler.py

+        api_key: Optional[str] = None,
+    ) -> None:
+        self._interval_hours = interval_hours or settings.training_interval_hours
+        self._api_url = api_url or f"http://{settings.host}:{settings.port}"


Defaulting api_url to http://{settings.host}:{settings.port} can break when the server binds to 0.0.0.0 (valid bind address but not a connect target). Consider defaulting the client URL to http://127.0.0.1:{settings.port} (or introducing a dedicated settings.internal_api_url) so the scheduler always connects reliably regardless of bind host.

Suggested change

self._api_url = api_url or f"http://{settings.host}:{settings.port}"

self._api_url = api_url or f"http://127.0.0.1:{settings.port}"

Copilot · 2026-03-02T14:35:47Z

backend/src/idio/infrastructure/ai/llm_engine.py

+    lora_path = _resolve_active_adapter() if use_adapter else None
+
+    if use_adapter and lora_path is None:
+        logger.warning(
+            "Mirror mode requested but no trained adapter found. "
+            "Falling back to base model. Run training to generate an adapter."
+        )
+
+    # get_model handles the actual cold-swap and memory tracking internally.
+    get_model(lora_path=lora_path)
+


_swap_adapter() always calls _resolve_active_adapter() in mirror mode, which instantiates AdapterRegistry and hits SQLite on every mirror request, even when the correct adapter is already loaded and get_model(...) would take the fast path. This adds avoidable latency to the hot path. Consider caching the resolved active adapter path in-memory and only re-resolving when a swap is actually needed (e.g., when _current_lora_path differs, or when a reload signal indicates a new promoted version).

Copilot · 2026-03-02T14:35:47Z

backend/scripts/train_lora.py

+    """Write a signal file so the server knows to hot-swap the adapter.
+
+    The server's health-check or a background watcher can poll this file.
+    Using a file avoids introducing an additional HTTP endpoint just for reloading.
+
+    Args:
+        adapter_path: Path to the newly active GGUF adapter.
+    """
+    _RELOAD_SIGNAL_FILE.parent.mkdir(parents=True, exist_ok=True)
+    _RELOAD_SIGNAL_FILE.write_text(
+        json.dumps({
+            "adapter_path": str(adapter_path),
+            "timestamp": time.time(),
+        })
+    )
+    logger.info("Reload signal written to {}.", _RELOAD_SIGNAL_FILE)


_write_reload_signal() writes storage/.adapter_reload, but there’s no corresponding reader/watcher in backend/src (no code references .adapter_reload). As-is, this step won’t cause the server to reload/cold-swap the adapter and may mislead operators. Either implement the polling/watcher mechanism (or an explicit internal reload endpoint) or remove this signal file step to avoid dead/unused behavior.

Suggested change

"""Write a signal file so the server knows to hot-swap the adapter.

The server's health-check or a background watcher can poll this file.

Using a file avoids introducing an additional HTTP endpoint just for reloading.

Args:

adapter_path: Path to the newly active GGUF adapter.

"""

_RELOAD_SIGNAL_FILE.parent.mkdir(parents=True, exist_ok=True)

_RELOAD_SIGNAL_FILE.write_text(

json.dumps({

"adapter_path": str(adapter_path),

"timestamp": time.time(),

})

)

logger.info("Reload signal written to {}.", _RELOAD_SIGNAL_FILE)

"""Placeholder for signaling the server to hot-swap the adapter.

NOTE: As of now, there is no watcher/endpoint in this backend that consumes

a `.adapter_reload` file. This function is intentionally a no-op to avoid

implying a reload mechanism that does not exist yet.

Args:

adapter_path: Path to the newly active GGUF adapter.

"""

# Intentionally do not write any reload signal file until a corresponding

# watcher/endpoint is implemented on the server side.

logger.warning(

"Adapter reload requested for %s, but no reload mechanism is currently "

"wired up in the backend; skipping reload signal emission.",

adapter_path,

)

Copilot · 2026-03-02T14:35:48Z

backend/src/idio/infrastructure/config/settings.py

    # Maximum examples returned in a single sync-training batch.
    training_sync_batch_size: int = 100

+    # Minimum unsynced examples required before the scheduler fires training.


training_min_examples is described as “required before the scheduler fires training”, but the scheduler currently spawns the subprocess purely on an interval; the script enforces the minimum-unsynced threshold. Consider updating the comment to reflect where the check actually happens (or add a pre-check in the scheduler before spawning).

Suggested change

# Minimum unsynced examples required before the scheduler fires training.

# Minimum unsynced examples the training script requires before running;

# the scheduler still triggers based solely on ``training_interval_hours``.

fix: added LoRA Fine-Tuning Script and moved from hot to cold swappin…

c344daf

…g in llama.cpp.

shknth requested a review from radhaneelamani March 2, 2026 14:23

radhaneelamani requested a review from Copilot March 2, 2026 14:26

Copilot started reviewing on behalf of radhaneelamani March 2, 2026 14:26 View session

Copilot AI reviewed Mar 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: added LoRA Fine-Tuning Script and moved from hot to cold swapping in llama.cpp.#10

fix: added LoRA Fine-Tuning Script and moved from hot to cold swapping in llama.cpp.#10
shknth wants to merge 1 commit intodevfrom
dev-BRAIN-1

shknth commented Mar 2, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Copilot AI Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	self._api_url = api_url or f"http://{settings.host}:{settings.port}"
	self._api_url = api_url or f"http://127.0.0.1:{settings.port}"

	# Minimum unsynced examples required before the scheduler fires training.
	# Minimum unsynced examples the training script requires before running;
	# the scheduler still triggers based solely on ``training_interval_hours``.

Conversation

shknth commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shknth commented Mar 2, 2026 •

edited

Loading