Skip to content

feat(models): implement model discovery and management system#17

Merged
ringo380 merged 3 commits intomainfrom
feat/model-discovery-management
Apr 2, 2026
Merged

feat(models): implement model discovery and management system#17
ringo380 merged 3 commits intomainfrom
feat/model-discovery-management

Conversation

@ringo380
Copy link
Copy Markdown
Owner

@ringo380 ringo380 commented Apr 2, 2026

Closes #5

Summary

  • Recursive discovery: list_models() now walks subdirectories (skipping .inferno_cache and other hidden dirs)
  • Real GGUF metadata: replaces filename-guessing stub with actual binary header parsing — reads general.architecture, general.parameter_count, general.file_type, and <arch>.context_length KV pairs via byteorder crate; falls back gracefully to filename heuristics
  • Metadata cache: sidecar JSON files in {models_dir}/.inferno_cache/ avoid re-parsing on every models info call (invalidated when model file is newer)
  • Model registry: persistent {models_dir}/.inferno_registry.json tracks tags, use count, and last-used timestamp per model
  • 4 new CLI commands:
    • models search <query> — queries HuggingFace API
    • models install <hf-repo-id | url> — downloads, validates, and registers GGUF models
    • models tag <model> <tags...> — adds user-defined tags
    • models stats — usage table sorted by run count
  • Enhanced existing commands: models list and models info now show tags, use count, and RAM compatibility estimate
  • Usage tracking: GGUF and ONNX backends call record_model_usage() after successful model load

Test plan

  • cargo test — 900 tests pass
  • cargo clippy -- -D warnings — no errors
  • cargo fmt --check — clean
  • cargo run -- models list with models in subdirectories
  • cargo run -- models info <gguf> shows real architecture/quantization (not filename-guessed)
  • cargo run -- models search llama returns HuggingFace results
  • cargo run -- models install <hf-repo> downloads and validates a model
  • cargo run -- models tag <model> chat then models info shows the tag
  • cargo run -- models stats shows entries after inference run

Closes #5

- Recursive directory scanning for GGUF/ONNX models (was flat-only)
- Real GGUF binary header parsing via byteorder crate; extracts
  architecture, parameter count, context length, and quantization
  from the file's KV metadata section; falls back to filename
  heuristics on parse failure
- Metadata cache: sidecar JSON files in .inferno_cache/ invalidated
  when the model file is newer, avoiding repeated header parses
- ModelRegistry: persistent JSON registry at .inferno_registry.json
  tracking tags, use_count, and last_used per model
- models search: queries HuggingFace API with result table
- models install: downloads GGUF from HF repo ID or direct URL,
  validates the file, and registers it in the local registry
- models tag: adds user-defined tags to local models
- models stats: shows usage table sorted by run count
- models list/info: now show tags, use count, and RAM compatibility
- Usage tracking: GGUF and ONNX backends call record_model_usage()
  on successful model load
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8b6c4193ca

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +346 to +350
let filename = model_path
.file_name()
.and_then(|n| n.to_str())
.unwrap_or("unknown");
self.cache_dir().join(format!("{}.json", filename))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Key metadata cache by model path instead of filename

The cache filename is derived only from model_path.file_name(), so two models in different subdirectories with the same basename (now possible due to recursive discovery) share one cache entry. After one model is parsed, models info for the other can read the wrong cached metadata if mtimes allow it, producing incorrect architecture/quantization/parameter output. Use a path-based key (e.g., relative path hash) to avoid collisions.

Useful? React with 👍 / 👎.

let total = resp.content_length();
let mut downloaded: u64 = 0;

let mut file = tokio::fs::File::create(dest).await?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Create destination parent dirs before opening download file

download_to_file calls tokio::fs::File::create(dest) directly, but install paths can include subdirectories (for example Hugging Face rfilename values like subdir/model.gguf or a --name containing /). In those cases the parent directory does not exist and install fails before download starts. Ensure dest.parent() exists with create_dir_all before file creation.

Useful? React with 👍 / 👎.

Comment on lines +1073 to +1074
// Fallback: immediate parent
model_path.parent().unwrap_or(model_path).to_path_buf()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Record usage in root models registry when none exists yet

When no .inferno_registry.json is found, infer_models_dir falls back to the model’s immediate parent directory. For nested models loaded before any registry file exists, usage gets written to a subdirectory-local registry, while models list/info/stats read only {config.models_dir}/.inferno_registry.json; the recorded usage is effectively invisible. Fall back to the configured models root (or derive it from known config) rather than the immediate parent.

Useful? React with 👍 / 👎.

@ringo380
Copy link
Copy Markdown
Owner Author

ringo380 commented Apr 2, 2026

Code review

Found 2 issues:

  1. GGUF array parsing desynchronizes the cursor when array element count exceeds 65536 — the loop in read_gguf_value and skip_gguf_value caps at count.min(65536) and returns without advancing past the remaining elements. After truncation the cursor points mid-array, so every subsequent KV entry is parsed from the wrong offset, silently producing corrupt architecture/quantization metadata. Real GGUF files contain large arrays (e.g. RoPE frequency tables).

inferno/src/models/mod.rs

Lines 892 to 902 in 8b6c419

Ok(String::from_utf8_lossy(&bytes).to_string())
}
9 => {
let elem_type = cursor.read_u32::<LittleEndian>()?;
let count = cursor.read_u64::<LittleEndian>()?;
for _ in 0..count.min(65536) {
skip_gguf_value(cursor, elem_type)?;
}
Ok(String::new())
}
10 => Ok(cursor.read_u64::<LittleEndian>()?.to_string()),

Fix: compute element byte-size for fixed-width types and cursor.set_position past the array, or remove the cap entirely (the early-exit when all four fields are populated already bounds the loop in practice).

  1. Metadata cache key collides for same-named models in different subdirectoriesmetadata_cache_path uses only the bare filename (model_path.file_name()), so models/a/weights.gguf and models/b/weights.gguf both write to .inferno_cache/weights.gguf.json. Since this PR also makes list_models() recursive, the second model parsed silently overwrites the first's cached architecture and quantization, causing models info to return wrong metadata for one of them indefinitely.

inferno/src/models/mod.rs

Lines 345 to 355 in 8b6c419

fn metadata_cache_path(&self, model_path: &Path) -> PathBuf {
let filename = model_path
.file_name()
.and_then(|n| n.to_str())
.unwrap_or("unknown");
self.cache_dir().join(format!("{}.json", filename))
}
fn cache_dir(&self) -> PathBuf {
self.models_dir.join(".inferno_cache")
}

Fix: derive the cache filename from a hash of the full absolute path (e.g. format!("{:x}.json", sha256(path.to_string_lossy()))), or replace slashes with underscores in the relative path.


🤖 Generated with Claude Code

If this code review was useful, please react with 👍. Otherwise, react with 👎.

ringo380 added 2 commits April 2, 2026 14:22
- models/mod.rs: infer_models_dir now also detects .inferno_cache dir as
  models root marker, preventing registry being written to wrong directory
- cli/models.rs: download_to_file cleans up partial file on any mid-stream
  error; Install command rejects filenames containing path traversal sequences
- backends/gguf.rs: compute softmax on raw logits before building candidate
  list so sampler receives valid probabilities (c.p() was always 0.0);
  add stop_sequences checking in both generate_response and generate_stream
  loops using accumulated token text
- models/mod.rs: fix get_available_ram_gb() dividing by 1_048_576 (MB)
  instead of 1_073_741_824 (GB) — compatibility check was always false
- models/mod.rs: canonicalize path before using as registry key in
  record_usage, tag_model, register_model — matches doc comment and
  metadata_cache_path; update test to use canonical key
- models/mod.rs: validate GGUF magic bytes at start of
  parse_gguf_kv_metadata instead of blindly skipping them
- backends/gguf.rs: move stop sequence check before output_tokens.push()
  so triggering token is not included in final response
- backends/gguf.rs, onnx.rs: move record_model_usage from load_model to
  infer/infer_stream so use_count reflects actual inference calls, not
  speculative loads
- cli/models.rs: warn when installing a model over plain http://
- cli/models.rs: replace hardcoded ONNX stub metadata display with a
  clear message that ONNX metadata parsing is not yet implemented
@ringo380 ringo380 merged commit 3aee81d into main Apr 2, 2026
14 of 28 checks passed
@ringo380 ringo380 deleted the feat/model-discovery-management branch April 2, 2026 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Model Discovery and Management System

1 participant