feat(models): implement model discovery and management system#17
feat(models): implement model discovery and management system#17
Conversation
Closes #5 - Recursive directory scanning for GGUF/ONNX models (was flat-only) - Real GGUF binary header parsing via byteorder crate; extracts architecture, parameter count, context length, and quantization from the file's KV metadata section; falls back to filename heuristics on parse failure - Metadata cache: sidecar JSON files in .inferno_cache/ invalidated when the model file is newer, avoiding repeated header parses - ModelRegistry: persistent JSON registry at .inferno_registry.json tracking tags, use_count, and last_used per model - models search: queries HuggingFace API with result table - models install: downloads GGUF from HF repo ID or direct URL, validates the file, and registers it in the local registry - models tag: adds user-defined tags to local models - models stats: shows usage table sorted by run count - models list/info: now show tags, use count, and RAM compatibility - Usage tracking: GGUF and ONNX backends call record_model_usage() on successful model load
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8b6c4193ca
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
src/models/mod.rs
Outdated
| let filename = model_path | ||
| .file_name() | ||
| .and_then(|n| n.to_str()) | ||
| .unwrap_or("unknown"); | ||
| self.cache_dir().join(format!("{}.json", filename)) |
There was a problem hiding this comment.
Key metadata cache by model path instead of filename
The cache filename is derived only from model_path.file_name(), so two models in different subdirectories with the same basename (now possible due to recursive discovery) share one cache entry. After one model is parsed, models info for the other can read the wrong cached metadata if mtimes allow it, producing incorrect architecture/quantization/parameter output. Use a path-based key (e.g., relative path hash) to avoid collisions.
Useful? React with 👍 / 👎.
src/cli/models.rs
Outdated
| let total = resp.content_length(); | ||
| let mut downloaded: u64 = 0; | ||
|
|
||
| let mut file = tokio::fs::File::create(dest).await?; |
There was a problem hiding this comment.
Create destination parent dirs before opening download file
download_to_file calls tokio::fs::File::create(dest) directly, but install paths can include subdirectories (for example Hugging Face rfilename values like subdir/model.gguf or a --name containing /). In those cases the parent directory does not exist and install fails before download starts. Ensure dest.parent() exists with create_dir_all before file creation.
Useful? React with 👍 / 👎.
| // Fallback: immediate parent | ||
| model_path.parent().unwrap_or(model_path).to_path_buf() |
There was a problem hiding this comment.
Record usage in root models registry when none exists yet
When no .inferno_registry.json is found, infer_models_dir falls back to the model’s immediate parent directory. For nested models loaded before any registry file exists, usage gets written to a subdirectory-local registry, while models list/info/stats read only {config.models_dir}/.inferno_registry.json; the recorded usage is effectively invisible. Fall back to the configured models root (or derive it from known config) rather than the immediate parent.
Useful? React with 👍 / 👎.
Code reviewFound 2 issues:
Lines 892 to 902 in 8b6c419 Fix: compute element byte-size for fixed-width types and
Lines 345 to 355 in 8b6c419 Fix: derive the cache filename from a hash of the full absolute path (e.g. 🤖 Generated with Claude Code If this code review was useful, please react with 👍. Otherwise, react with 👎. |
- models/mod.rs: infer_models_dir now also detects .inferno_cache dir as models root marker, preventing registry being written to wrong directory - cli/models.rs: download_to_file cleans up partial file on any mid-stream error; Install command rejects filenames containing path traversal sequences - backends/gguf.rs: compute softmax on raw logits before building candidate list so sampler receives valid probabilities (c.p() was always 0.0); add stop_sequences checking in both generate_response and generate_stream loops using accumulated token text
- models/mod.rs: fix get_available_ram_gb() dividing by 1_048_576 (MB) instead of 1_073_741_824 (GB) — compatibility check was always false - models/mod.rs: canonicalize path before using as registry key in record_usage, tag_model, register_model — matches doc comment and metadata_cache_path; update test to use canonical key - models/mod.rs: validate GGUF magic bytes at start of parse_gguf_kv_metadata instead of blindly skipping them - backends/gguf.rs: move stop sequence check before output_tokens.push() so triggering token is not included in final response - backends/gguf.rs, onnx.rs: move record_model_usage from load_model to infer/infer_stream so use_count reflects actual inference calls, not speculative loads - cli/models.rs: warn when installing a model over plain http:// - cli/models.rs: replace hardcoded ONNX stub metadata display with a clear message that ONNX metadata parsing is not yet implemented
Closes #5
Summary
list_models()now walks subdirectories (skipping.inferno_cacheand other hidden dirs)general.architecture,general.parameter_count,general.file_type, and<arch>.context_lengthKV pairs viabyteordercrate; falls back gracefully to filename heuristics{models_dir}/.inferno_cache/avoid re-parsing on everymodels infocall (invalidated when model file is newer){models_dir}/.inferno_registry.jsontracks tags, use count, and last-used timestamp per modelmodels search <query>— queries HuggingFace APImodels install <hf-repo-id | url>— downloads, validates, and registers GGUF modelsmodels tag <model> <tags...>— adds user-defined tagsmodels stats— usage table sorted by run countmodels listandmodels infonow show tags, use count, and RAM compatibility estimaterecord_model_usage()after successful model loadTest plan
cargo test— 900 tests passcargo clippy -- -D warnings— no errorscargo fmt --check— cleancargo run -- models listwith models in subdirectoriescargo run -- models info <gguf>shows real architecture/quantization (not filename-guessed)cargo run -- models search llamareturns HuggingFace resultscargo run -- models install <hf-repo>downloads and validates a modelcargo run -- models tag <model> chatthenmodels infoshows the tagcargo run -- models statsshows entries after inference run