feat(models): implement model discovery and management system by ringo380 · Pull Request #17 · ringo380/inferno

ringo380 · 2026-04-02T18:52:53Z

Closes #5

Summary

Recursive discovery: list_models() now walks subdirectories (skipping .inferno_cache and other hidden dirs)
Real GGUF metadata: replaces filename-guessing stub with actual binary header parsing — reads general.architecture, general.parameter_count, general.file_type, and <arch>.context_length KV pairs via byteorder crate; falls back gracefully to filename heuristics
Metadata cache: sidecar JSON files in {models_dir}/.inferno_cache/ avoid re-parsing on every models info call (invalidated when model file is newer)
Model registry: persistent {models_dir}/.inferno_registry.json tracks tags, use count, and last-used timestamp per model
4 new CLI commands:
- models search <query> — queries HuggingFace API
- models install <hf-repo-id | url> — downloads, validates, and registers GGUF models
- models tag <model> <tags...> — adds user-defined tags
- models stats — usage table sorted by run count
Enhanced existing commands: models list and models info now show tags, use count, and RAM compatibility estimate
Usage tracking: GGUF and ONNX backends call record_model_usage() after successful model load

Test plan

cargo test — 900 tests pass
cargo clippy -- -D warnings — no errors
cargo fmt --check — clean
cargo run -- models list with models in subdirectories
cargo run -- models info <gguf> shows real architecture/quantization (not filename-guessed)
cargo run -- models search llama returns HuggingFace results
cargo run -- models install <hf-repo> downloads and validates a model
cargo run -- models tag <model> chat then models info shows the tag
cargo run -- models stats shows entries after inference run

Closes #5 - Recursive directory scanning for GGUF/ONNX models (was flat-only) - Real GGUF binary header parsing via byteorder crate; extracts architecture, parameter count, context length, and quantization from the file's KV metadata section; falls back to filename heuristics on parse failure - Metadata cache: sidecar JSON files in .inferno_cache/ invalidated when the model file is newer, avoiding repeated header parses - ModelRegistry: persistent JSON registry at .inferno_registry.json tracking tags, use_count, and last_used per model - models search: queries HuggingFace API with result table - models install: downloads GGUF from HF repo ID or direct URL, validates the file, and registers it in the local registry - models tag: adds user-defined tags to local models - models stats: shows usage table sorted by run count - models list/info: now show tags, use count, and RAM compatibility - Usage tracking: GGUF and ONNX backends call record_model_usage() on successful model load

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8b6c4193ca

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-02T18:56:51Z

src/models/mod.rs

+        let filename = model_path
+            .file_name()
+            .and_then(|n| n.to_str())
+            .unwrap_or("unknown");
+        self.cache_dir().join(format!("{}.json", filename))


Key metadata cache by model path instead of filename

The cache filename is derived only from model_path.file_name(), so two models in different subdirectories with the same basename (now possible due to recursive discovery) share one cache entry. After one model is parsed, models info for the other can read the wrong cached metadata if mtimes allow it, producing incorrect architecture/quantization/parameter output. Use a path-based key (e.g., relative path hash) to avoid collisions.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-02T18:56:51Z

src/cli/models.rs

+    let total = resp.content_length();
+    let mut downloaded: u64 = 0;
+
+    let mut file = tokio::fs::File::create(dest).await?;


Create destination parent dirs before opening download file

download_to_file calls tokio::fs::File::create(dest) directly, but install paths can include subdirectories (for example Hugging Face rfilename values like subdir/model.gguf or a --name containing /). In those cases the parent directory does not exist and install fails before download starts. Ensure dest.parent() exists with create_dir_all before file creation.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-02T18:56:51Z

src/models/mod.rs

+    // Fallback: immediate parent
+    model_path.parent().unwrap_or(model_path).to_path_buf()


Record usage in root models registry when none exists yet

When no .inferno_registry.json is found, infer_models_dir falls back to the model’s immediate parent directory. For nested models loaded before any registry file exists, usage gets written to a subdirectory-local registry, while models list/info/stats read only {config.models_dir}/.inferno_registry.json; the recorded usage is effectively invisible. Fall back to the configured models root (or derive it from known config) rather than the immediate parent.

Useful? React with 👍 / 👎.

ringo380 · 2026-04-02T19:06:21Z

Code review

Found 2 issues:

GGUF array parsing desynchronizes the cursor when array element count exceeds 65536 — the loop in read_gguf_value and skip_gguf_value caps at count.min(65536) and returns without advancing past the remaining elements. After truncation the cursor points mid-array, so every subsequent KV entry is parsed from the wrong offset, silently producing corrupt architecture/quantization metadata. Real GGUF files contain large arrays (e.g. RoPE frequency tables).

inferno/src/models/mod.rs

Lines 892 to 902 in 8b6c419

    
               Ok(String::from_utf8_lossy(&bytes).to_string()) 
        
           } 
        
           9 => { 
        
               let elem_type = cursor.read_u32::<LittleEndian>()?; 
        
               let count = cursor.read_u64::<LittleEndian>()?; 
        
               for _ in 0..count.min(65536) { 
        
                   skip_gguf_value(cursor, elem_type)?; 
        
               } 
        
               Ok(String::new()) 
        
           } 
        
           10 => Ok(cursor.read_u64::<LittleEndian>()?.to_string()),

Fix: compute element byte-size for fixed-width types and cursor.set_position past the array, or remove the cap entirely (the early-exit when all four fields are populated already bounds the loop in practice).

Metadata cache key collides for same-named models in different subdirectories — metadata_cache_path uses only the bare filename (model_path.file_name()), so models/a/weights.gguf and models/b/weights.gguf both write to .inferno_cache/weights.gguf.json. Since this PR also makes list_models() recursive, the second model parsed silently overwrites the first's cached architecture and quantization, causing models info to return wrong metadata for one of them indefinitely.

inferno/src/models/mod.rs

Lines 345 to 355 in 8b6c419

    
           fn metadata_cache_path(&self, model_path: &Path) -> PathBuf { 
        
               let filename = model_path 
        
                   .file_name() 
        
                   .and_then(|n| n.to_str()) 
        
                   .unwrap_or("unknown"); 
        
               self.cache_dir().join(format!("{}.json", filename)) 
        
           } 
        
           fn cache_dir(&self) -> PathBuf { 
        
               self.models_dir.join(".inferno_cache") 
        
           }

Fix: derive the cache filename from a hash of the full absolute path (e.g. format!("{:x}.json", sha256(path.to_string_lossy()))), or replace slashes with underscores in the relative path.

🤖 Generated with Claude Code

_{If this code review was useful, please react with 👍. Otherwise, react with 👎.}

- models/mod.rs: infer_models_dir now also detects .inferno_cache dir as models root marker, preventing registry being written to wrong directory - cli/models.rs: download_to_file cleans up partial file on any mid-stream error; Install command rejects filenames containing path traversal sequences - backends/gguf.rs: compute softmax on raw logits before building candidate list so sampler receives valid probabilities (c.p() was always 0.0); add stop_sequences checking in both generate_response and generate_stream loops using accumulated token text

- models/mod.rs: fix get_available_ram_gb() dividing by 1_048_576 (MB) instead of 1_073_741_824 (GB) — compatibility check was always false - models/mod.rs: canonicalize path before using as registry key in record_usage, tag_model, register_model — matches doc comment and metadata_cache_path; update test to use canonical key - models/mod.rs: validate GGUF magic bytes at start of parse_gguf_kv_metadata instead of blindly skipping them - backends/gguf.rs: move stop sequence check before output_tokens.push() so triggering token is not included in final response - backends/gguf.rs, onnx.rs: move record_model_usage from load_model to infer/infer_stream so use_count reflects actual inference calls, not speculative loads - cli/models.rs: warn when installing a model over plain http:// - cli/models.rs: replace hardcoded ONNX stub metadata display with a clear message that ONNX metadata parsing is not yet implemented

chatgpt-codex-connector bot reviewed Apr 2, 2026

View reviewed changes

ringo380 added 2 commits April 2, 2026 14:22

ringo380 merged commit 3aee81d into main Apr 2, 2026
14 of 28 checks passed

ringo380 deleted the feat/model-discovery-management branch April 2, 2026 20:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(models): implement model discovery and management system#17

feat(models): implement model discovery and management system#17
ringo380 merged 3 commits intomainfrom
feat/model-discovery-management

ringo380 commented Apr 2, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 2, 2026

Uh oh!

chatgpt-codex-connector bot Apr 2, 2026

Uh oh!

chatgpt-codex-connector bot Apr 2, 2026

Uh oh!

ringo380 commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		// Fallback: immediate parent
		model_path.parent().unwrap_or(model_path).to_path_buf()

Conversation

ringo380 commented Apr 2, 2026

Summary

Test plan

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

ringo380 commented Apr 2, 2026

Code review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant