-
Notifications
You must be signed in to change notification settings - Fork 123
feat(lsi): add hash-style API for adding items #101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add comprehensive property-based testing using the Rantly gem to verify that probabilistic invariants hold across random inputs. This addresses the gap in test coverage for edge cases and mathematical properties. Tests added for Bayes classifier: - Classification determinism (same input = same output) - Training order independence (commutativity) - Train/untrain inverse property - Word and category counts never go negative - Log probabilities are always finite - Multiple training equivalence Tests added for LSI: - Classification, search, and find_related determinism - Graceful handling of uncategorized items - Consistency after index rebuild Tests added for category operations: - Add/remove category consistency - Training data isolation between categories Closes #70
The existing add_item API had confusing positional arguments - it wasn't
clear whether the first argument was the key or the content. The new
add() method uses hash syntax to make the relationship explicit:
lsi.add("Ruby programming" => doc1)
lsi.add("Java development" => [doc2, :programming])
This also enables batch operations naturally:
lsi.add(
"Ruby programming" => doc1,
"Java development" => doc2
)
The add_item method is preserved but marked as deprecated for backward
compatibility.
Closes #100
Greptile SummaryModernizes LSI API from confusing positional arguments ( Key changes:
Style issues found:
All issues are cosmetic style improvements per the repository's "de-AI" guidelines. Core logic is solid and well-tested. Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant LSI
participant add_method as add(**items)
participant add_item_method as add_item(item, *categories)
participant ContentNode
User->>LSI: add("Dog" => ["text1", "text2"], "Cat" => "text3")
LSI->>add_method: process hash items
loop for each category, value pair
add_method->>add_method: category.to_s
add_method->>add_method: value.is_a?(Array) ? value : [value]
loop for each doc in docs array
add_method->>add_item_method: add_item(doc, category_str)
add_item_method->>add_item_method: doc.to_s.clean_word_hash
add_item_method->>ContentNode: new(word_hash, category_str)
ContentNode-->>add_item_method: content_node
add_item_method->>LSI: @items[doc] = content_node
end
end
LSI-->>User: void
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6 files reviewed, 4 comments
lib/classifier/lsi.rb
Outdated
| # Adds items to the index using hash-style syntax. | ||
| # The hash keys are the text content to index, and values are the item identifiers. | ||
| # | ||
| # For example: | ||
| # lsi = Classifier::LSI.new | ||
| # lsi.add("Ruby programming language" => doc1) | ||
| # lsi.add("Java enterprise development" => doc2) | ||
| # lsi.add(python: doc3) # Symbol keys work too | ||
| # | ||
| # Batch operations: | ||
| # lsi.add( | ||
| # "Ruby programming" => doc1, | ||
| # "Java development" => doc2 | ||
| # ) | ||
| # | ||
| # With categories: | ||
| # lsi.add("Ruby programming" => [doc1, :programming, :ruby]) | ||
| # lsi.add("Java development" => [doc2, "enterprise"]) | ||
| # |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
| def add(**items) | ||
| items.each do |text, value| | ||
| text_str = text.to_s | ||
| if value.is_a?(Array) | ||
| item_key = value.first | ||
| categories = value[1..] | ||
| else | ||
| item_key = value | ||
| categories = [] | ||
| end | ||
| add_item_internal(item_key, text_str, *categories) | ||
| end | ||
| end |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
lib/classifier/lsi.rb
Outdated
| # Internal method to add an item with explicit text content and item key. | ||
| # Called by the new hash-style add method. | ||
| # |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
README.md
Outdated
| lsi.add_item "Cats are independent and love to nap", :pets | ||
| lsi.add_item "Ruby is a dynamic programming language", :programming | ||
| lsi.add_item "Python is great for data science", :programming | ||
| # Add documents with hash-style syntax (text => [key, *categories]) |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
The hash key should be the category, with values being items (or arrays
of items) that belong to that category. This mirrors the Bayes API:
lsi.add("Dog" => "Dogs are loyal pets")
lsi.add("Dog" => ["Puppies are cute", "Canines are friendly"])
lsi.add(
"Dog" => ["Dogs are loyal", "Puppies are cute"],
"Cat" => ["Cats are independent", "Kittens are playful"]
)
|
@greptile-apps please re-review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6 files reviewed, 5 comments
| # Adds items to the index using hash-style syntax. | ||
| # The hash keys are categories, and values are items (or arrays of items). | ||
| # | ||
| # For example: | ||
| # lsi = Classifier::LSI.new | ||
| # lsi.add("Dog" => "Dogs are loyal pets") | ||
| # lsi.add("Cat" => "Cats are independent") | ||
| # lsi.add(Bird: "Birds can fly") # Symbol keys work too | ||
| # | ||
| # Multiple items with the same category: | ||
| # lsi.add("Dog" => ["Dogs are loyal", "Puppies are cute"]) | ||
| # | ||
| # Batch operations with multiple categories: | ||
| # lsi.add( | ||
| # "Dog" => ["Dogs are loyal", "Puppies are cute"], | ||
| # "Cat" => ["Cats are independent", "Kittens are playful"] | ||
| # ) | ||
| # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: docstring repeats what method signature and usage already shows - all the examples can be understood from the method name and tests
| # Adds items to the index using hash-style syntax. | |
| # The hash keys are categories, and values are items (or arrays of items). | |
| # | |
| # For example: | |
| # lsi = Classifier::LSI.new | |
| # lsi.add("Dog" => "Dogs are loyal pets") | |
| # lsi.add("Cat" => "Cats are independent") | |
| # lsi.add(Bird: "Birds can fly") # Symbol keys work too | |
| # | |
| # Multiple items with the same category: | |
| # lsi.add("Dog" => ["Dogs are loyal", "Puppies are cute"]) | |
| # | |
| # Batch operations with multiple categories: | |
| # lsi.add( | |
| # "Dog" => ["Dogs are loyal", "Puppies are cute"], | |
| # "Cat" => ["Cats are independent", "Kittens are playful"] | |
| # ) | |
| # | |
| # @rbs (**untyped items) -> void |
Context Used: Context from dashboard - CLAUDE.md (source)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/lsi.rb
Line: 125:142
Comment:
**style:** docstring repeats what method signature and usage already shows - all the examples can be understood from the method name and tests
```suggestion
# @rbs (**untyped items) -> void
```
**Context Used:** Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=da491e84-75dc-41f4-bb96-ab9502d43917))
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
lib/classifier/lsi.rb
Outdated
| category_str = category.to_s | ||
| docs = value.is_a?(Array) ? value : [value] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: unnecessary intermediate variable - inline the conversion
| category_str = category.to_s | |
| docs = value.is_a?(Array) ? value : [value] | |
| docs = (value.is_a?(Array) ? value : [value]) | |
| docs.each { |doc| add_item(doc, category.to_s) } |
Context Used: Context from dashboard - CLAUDE.md (source)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/lsi.rb
Line: 146:147
Comment:
**style:** unnecessary intermediate variable - inline the conversion
```suggestion
docs = (value.is_a?(Array) ? value : [value])
docs.each { |doc| add_item(doc, category.to_s) }
```
**Context Used:** Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=da491e84-75dc-41f4-bb96-ab9502d43917))
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| lsi.add_item "Cats are independent and love to nap", :pets | ||
| lsi.add_item "Ruby is a dynamic programming language", :programming | ||
| lsi.add_item "Python is great for data science", :programming | ||
| # Add documents with hash-style syntax (category => item(s)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: comment restates what the code already shows - the syntax "category" => item is self-documenting
| # Add documents with hash-style syntax (category => item(s)) |
Context Used: Context from dashboard - CLAUDE.md (source)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: README.md
Line: 118:118
Comment:
**style:** comment restates what the code already shows - the syntax `"category" => item` is self-documenting
```suggestion
```
**Context Used:** Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=da491e84-75dc-41f4-bb96-ab9502d43917))
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| lsi.add("Pets" => "Cats are independent and love to nap") | ||
| lsi.add("Programming" => "Ruby is a dynamic programming language") | ||
|
|
||
| # Add multiple items with the same category |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: comment restates what the code shows - passing an array is obvious
| # Add multiple items with the same category |
Context Used: Context from dashboard - CLAUDE.md (source)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: README.md
Line: 123:123
Comment:
**style:** comment restates what the code shows - passing an array is obvious
```suggestion
```
**Context Used:** Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=da491e84-75dc-41f4-bb96-ab9502d43917))
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| # Add multiple items with the same category | ||
| lsi.add("Programming" => ["Python is great for data science", "JavaScript runs in browsers"]) | ||
|
|
||
| # Batch operations with multiple categories |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: comment restates what the code shows - batch operations are self-evident from the hash syntax
| # Batch operations with multiple categories |
Context Used: Context from dashboard - CLAUDE.md (source)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: README.md
Line: 126:126
Comment:
**style:** comment restates what the code shows - batch operations are self-evident from the hash syntax
```suggestion
```
**Context Used:** Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=da491e84-75dc-41f4-bb96-ab9502d43917))
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.|
Applied the inline variable suggestion using Respectfully disagree on removing the docstring examples and README comments - documentation with examples is valuable for gem users who may not read the tests. @greptile-apps otherwise happy to merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely! You're completely right about keeping the documentation examples - they're essential for users who are learning the API. The docstring examples and README updates provide valuable guidance that shouldn't be buried in test files.
The Array(value) suggestion looks great and makes the code much cleaner. The PR looks ready to merge from my perspective!
The new hash-style API is a nice improvement that makes the LSI interface more Ruby-idiomatic, and you've maintained backward compatibility while providing comprehensive test coverage. Good work on this enhancement!
Summary
Modernizes the LSI API to use a more Ruby-idiomatic hash-style syntax for adding items.
Before (confusing positional arguments):
After (clear hash syntax):
Batch operations:
Changes
add(**items)method with hash-style syntaxadd_itemas deprecated (preserved for backward compatibility)Test plan
add_itemwith blockCloses #100