-
Notifications
You must be signed in to change notification settings - Fork 1
WIP on much faster word->symbols conversion. Not hooked in - but benches exist #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
It looks like the piece to look at is the For symbolizing a sequence of N "a"s using two symbols 'a' and 'aa' we find
The key point is that the new method is roughly linear - while the old is not. And the new version is faster than the old even for small inputs. It works by using a dynamic programming / path-finding style approach. |
|
Running on a more realistic data set the new method takes 2.0481 ms vs the old 9.6705 ms. Data is
Then we symbolise all 3897 words. |
* These are mostly due to needlessly borrowing things that are already references
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new, faster word-to-symbols conversion implementation alongside comprehensive benchmarking infrastructure. The implementation appears to be experimental work ("WIP" and "Not hooked in") exploring performance optimizations for symbolization algorithms.
- Adds a complete new symbolization module with multiple algorithmic variants
- Removes unnecessary reference operators in existing symbol operations
- Implements benchmarking infrastructure to compare old vs new approaches
Reviewed Changes
Copilot reviewed 6 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| markovian-core/src/symbolize_new.rs | New module with experimental symbolization algorithms and extensive test coverage |
| markovian-core/src/symbol.rs | Removes redundant reference operators in slice method calls |
| markovian-core/src/renderer.rs | Removes redundant reference operators in slice method calls |
| markovian-core/src/lib.rs | Adds new symbolize_new module to public interface |
| markovian-core/benches/my_benchmark.rs | Comprehensive benchmarking suite comparing old and new symbolization approaches |
| markovian-core/Cargo.toml | Adds criterion benchmarking dependency and configuration |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| if s.len() > 1 { | ||
| for j in 0..s.len() - 1 { | ||
| symbol_index | ||
| .entry(&s[0..j]) |
Copilot
AI
Oct 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The slice range &s[0..j] should be &s[0..=j] to include the element at index j, matching the pattern used elsewhere in the codebase for prefix generation.
| .entry(&s[0..j]) | |
| .entry(&s[0..=j]) |
| let mut ff: FwdState = f.state(); | ||
| ff.set_input_length(4).unwrap(); | ||
|
|
||
| // We've set the length to 3 so we should be able to add to nodes less than or equal to 3 |
Copilot
AI
Oct 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is incorrect - the test actually sets input length to 4, not 3. The comment should read 'We've set the length to 4 so we should be able to add to nodes less than or equal to 4'.
| let mut ff: FwdState = f.state(); | ||
| ff.set_input_length(4).unwrap(); | ||
|
|
||
| // We've set the length to 3 so we should be able to add to nodes less than or equal to 3 |
Copilot
AI
Oct 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is incorrect - the test actually sets input length to 4, not 3. The comment should read 'We've set the length to 4 so we should be able to add to nodes less than or equal to 4'.
| // We've set the length to 3 so we should be able to add to nodes less than or equal to 3 | |
| // We've set the length to 4 so we should be able to add to nodes less than or equal to 4 |
This is dredged up from some old commits -- need to work out what this is doing and why.