Immediate focus points #8

Kishlay-notabot · 2025-08-26T10:25:00Z

Kishlay-notabot
Aug 26, 2025
Maintainer

Ease of installation (terminal usage).
GUI development (contributors are welcome and much needed!)
more important features listed in src/technical.md
Cross platform support.

Kishlay-notabot · 2025-09-02T19:18:24Z

Kishlay-notabot
Sep 2, 2025
Maintainer Author

Finetuning! Model kinda sucks at some cases now that I dont use langextract.

1 reply

Kishlay-notabot Sep 6, 2025
Maintainer Author

The base model performance has dropped significantly ever since I have stopped using LangExtract. Currently trying out two implementations, the base one being on the main branch and the other on feature/chunking.

Kishlay-notabot · 2025-09-06T16:57:48Z

Kishlay-notabot
Sep 6, 2025
Maintainer Author

I wrote eval scripts for 2 of my implementation branches. Both have the same input prompts but different response structures.

`feature/chunking` branch implementation:

Model: Qwen3-0.6B-Q8_0.gguf
Total Tests: 45
Success (Perfect): 2 (4.4%)
Failed (Critical): 31 (68.9%)
Warning (Minor Issues): 12 (26.7%)

`main` branch implementation:

Model: Qwen3-0.6B-Q8_0.gguf
Total Tests: 45
Success (Perfect): 6 (13.3%)
Failed (Critical): 31 (68.9%)
Warning (Minor Issues): 8 (17.8%)

The chunking ones have lesser success rates because the output json is complex, relatively.
The number for failed ones magically doesn't change. (something to do with temp? but the prompt and structure is different though)
I'll run both the tests again just to make sure I didn't mix the numbers up because same failure count is weird. I have detailed test results json files, I'll push them later on with clean code.

Personally if I had to choose one out of the two, I'd still use the chunking one because it is more robust and easier to have a grasp on, in terms of control in keeping track of. randomness of the output etc.

0 replies

Kishlay-notabot · 2025-12-15T08:48:38Z

Kishlay-notabot
Dec 15, 2025
Maintainer Author

Upon testing with the legacy non chunking implementation (llama.cpp json schema), doubling the context length and providing many examples to the system prompt gives amazing results. The score jumped from 13/30 to 24/30, but the tokens/second slashed from 500t/s to 250t/s (local inference on Mac M4 Air)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Immediate focus points #8

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Immediate focus points #8

Uh oh!

Uh oh!

Kishlay-notabot Aug 26, 2025 Maintainer

Replies: 3 comments · 1 reply

Uh oh!

Kishlay-notabot Sep 2, 2025 Maintainer Author

Uh oh!

Kishlay-notabot Sep 6, 2025 Maintainer Author

Uh oh!

Kishlay-notabot Sep 6, 2025 Maintainer Author

feature/chunking branch implementation:

main branch implementation:

Uh oh!

Kishlay-notabot Dec 15, 2025 Maintainer Author

Kishlay-notabot
Aug 26, 2025
Maintainer

Replies: 3 comments 1 reply

Kishlay-notabot
Sep 2, 2025
Maintainer Author

Kishlay-notabot Sep 6, 2025
Maintainer Author

Kishlay-notabot
Sep 6, 2025
Maintainer Author

`feature/chunking` branch implementation:

`main` branch implementation:

Kishlay-notabot
Dec 15, 2025
Maintainer Author