Skip to content

Conversation

@cardmagic
Copy link
Owner

Summary

Modernizes the LSI API to use a more Ruby-idiomatic hash-style syntax for adding items.

Before (confusing positional arguments):

lsi.add_item(doc1, "Ruby programming language")  # Is doc1 the key or content?

After (clear hash syntax):

lsi.add("Ruby programming language" => doc1)
lsi.add("Ruby programming" => [doc1, :programming, :dynamic])  # with categories

Batch operations:

lsi.add(
  "Ruby programming" => doc1,
  "Java development" => doc2
)

Changes

  • Added new add(**items) method with hash-style syntax
  • Marked add_item as deprecated (preserved for backward compatibility)
  • Added 10 comprehensive tests for the new API
  • Updated README documentation with new examples

Test plan

  • All 247 tests pass (native extension)
  • All 225 tests pass (pure Ruby mode)
  • New API produces equivalent results to legacy add_item with block
  • Batch operations work correctly
  • Categories are properly assigned
  • Auto-rebuild behavior is preserved

Closes #100

Add comprehensive property-based testing using the Rantly gem to verify
that probabilistic invariants hold across random inputs. This addresses
the gap in test coverage for edge cases and mathematical properties.

Tests added for Bayes classifier:
- Classification determinism (same input = same output)
- Training order independence (commutativity)
- Train/untrain inverse property
- Word and category counts never go negative
- Log probabilities are always finite
- Multiple training equivalence

Tests added for LSI:
- Classification, search, and find_related determinism
- Graceful handling of uncategorized items
- Consistency after index rebuild

Tests added for category operations:
- Add/remove category consistency
- Training data isolation between categories

Closes #70
The existing add_item API had confusing positional arguments - it wasn't
clear whether the first argument was the key or the content. The new
add() method uses hash syntax to make the relationship explicit:

  lsi.add("Ruby programming" => doc1)
  lsi.add("Java development" => [doc2, :programming])

This also enables batch operations naturally:

  lsi.add(
    "Ruby programming" => doc1,
    "Java development" => doc2
  )

The add_item method is preserved but marked as deprecated for backward
compatibility.

Closes #100
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 29, 2025

Greptile Summary

Modernizes LSI API from confusing positional arguments (add_item(doc, "category")) to clearer hash-style syntax (add("category" => doc)). The new add method wraps add_item internally, maintaining backward compatibility while enabling batch operations. All tests pass with comprehensive coverage.

Key changes:

  • New add(**items) method accepts hash of category => item(s)
  • Supports arrays for multiple items per category
  • Marked add_item as deprecated but preserved for compatibility
  • Added 10 new tests covering edge cases
  • Updated README with new examples

Style issues found:

  • Verbose docstring on new add method repeats what code shows
  • Redundant comments in README examples
  • Unnecessary intermediate variable category_str

All issues are cosmetic style improvements per the repository's "de-AI" guidelines. Core logic is solid and well-tested.

Confidence Score: 4/5

  • Safe to merge with minor style improvements suggested
  • Solid implementation with excellent test coverage (10 new tests, all passing). Only style issues found per custom guidelines - no logic errors or security concerns. Score of 4 reflects minor style improvements needed rather than functional problems.
  • Focus on lib/classifier/lsi.rb and README.md for style cleanup - removing verbose docstrings and redundant comments per repository guidelines

Important Files Changed

Filename Overview
lib/classifier/lsi.rb Added new hash-style add method with verbose docstring and ternary operator
README.md Updated examples to use new hash-style API with some redundant comments
test/lsi/lsi_test.rb Comprehensive test coverage for new hash-style API including edge cases

Sequence Diagram

sequenceDiagram
    participant User
    participant LSI
    participant add_method as add(**items)
    participant add_item_method as add_item(item, *categories)
    participant ContentNode

    User->>LSI: add("Dog" => ["text1", "text2"], "Cat" => "text3")
    LSI->>add_method: process hash items
    
    loop for each category, value pair
        add_method->>add_method: category.to_s
        add_method->>add_method: value.is_a?(Array) ? value : [value]
        
        loop for each doc in docs array
            add_method->>add_item_method: add_item(doc, category_str)
            add_item_method->>add_item_method: doc.to_s.clean_word_hash
            add_item_method->>ContentNode: new(word_hash, category_str)
            ContentNode-->>add_item_method: content_node
            add_item_method->>LSI: @items[doc] = content_node
        end
    end
    
    LSI-->>User: void
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 125 to 143
# Adds items to the index using hash-style syntax.
# The hash keys are the text content to index, and values are the item identifiers.
#
# For example:
# lsi = Classifier::LSI.new
# lsi.add("Ruby programming language" => doc1)
# lsi.add("Java enterprise development" => doc2)
# lsi.add(python: doc3) # Symbol keys work too
#
# Batch operations:
# lsi.add(
# "Ruby programming" => doc1,
# "Java development" => doc2
# )
#
# With categories:
# lsi.add("Ruby programming" => [doc1, :programming, :ruby])
# lsi.add("Java development" => [doc2, "enterprise"])
#

This comment was marked as outdated.

Comment on lines 145 to 157
def add(**items)
items.each do |text, value|
text_str = text.to_s
if value.is_a?(Array)
item_key = value.first
categories = value[1..]
else
item_key = value
categories = []
end
add_item_internal(item_key, text_str, *categories)
end
end

This comment was marked as outdated.

Comment on lines 573 to 575
# Internal method to add an item with explicit text content and item key.
# Called by the new hash-style add method.
#

This comment was marked as outdated.

README.md Outdated
lsi.add_item "Cats are independent and love to nap", :pets
lsi.add_item "Ruby is a dynamic programming language", :programming
lsi.add_item "Python is great for data science", :programming
# Add documents with hash-style syntax (text => [key, *categories])

This comment was marked as outdated.

The hash key should be the category, with values being items (or arrays
of items) that belong to that category. This mirrors the Bayes API:

  lsi.add("Dog" => "Dogs are loyal pets")
  lsi.add("Dog" => ["Puppies are cute", "Canines are friendly"])
  lsi.add(
    "Dog" => ["Dogs are loyal", "Puppies are cute"],
    "Cat" => ["Cats are independent", "Kittens are playful"]
  )
@cardmagic
Copy link
Owner Author

@greptile-apps please re-review

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +125 to +142
# Adds items to the index using hash-style syntax.
# The hash keys are categories, and values are items (or arrays of items).
#
# For example:
# lsi = Classifier::LSI.new
# lsi.add("Dog" => "Dogs are loyal pets")
# lsi.add("Cat" => "Cats are independent")
# lsi.add(Bird: "Birds can fly") # Symbol keys work too
#
# Multiple items with the same category:
# lsi.add("Dog" => ["Dogs are loyal", "Puppies are cute"])
#
# Batch operations with multiple categories:
# lsi.add(
# "Dog" => ["Dogs are loyal", "Puppies are cute"],
# "Cat" => ["Cats are independent", "Kittens are playful"]
# )
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: docstring repeats what method signature and usage already shows - all the examples can be understood from the method name and tests

Suggested change
# Adds items to the index using hash-style syntax.
# The hash keys are categories, and values are items (or arrays of items).
#
# For example:
# lsi = Classifier::LSI.new
# lsi.add("Dog" => "Dogs are loyal pets")
# lsi.add("Cat" => "Cats are independent")
# lsi.add(Bird: "Birds can fly") # Symbol keys work too
#
# Multiple items with the same category:
# lsi.add("Dog" => ["Dogs are loyal", "Puppies are cute"])
#
# Batch operations with multiple categories:
# lsi.add(
# "Dog" => ["Dogs are loyal", "Puppies are cute"],
# "Cat" => ["Cats are independent", "Kittens are playful"]
# )
#
# @rbs (**untyped items) -> void

Context Used: Context from dashboard - CLAUDE.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/lsi.rb
Line: 125:142

Comment:
**style:** docstring repeats what method signature and usage already shows - all the examples can be understood from the method name and tests

```suggestion
    # @rbs (**untyped items) -> void
```

**Context Used:** Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=da491e84-75dc-41f4-bb96-ab9502d43917))

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 146 to 147
category_str = category.to_s
docs = value.is_a?(Array) ? value : [value]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: unnecessary intermediate variable - inline the conversion

Suggested change
category_str = category.to_s
docs = value.is_a?(Array) ? value : [value]
docs = (value.is_a?(Array) ? value : [value])
docs.each { |doc| add_item(doc, category.to_s) }

Context Used: Context from dashboard - CLAUDE.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/lsi.rb
Line: 146:147

Comment:
**style:** unnecessary intermediate variable - inline the conversion

```suggestion
        docs = (value.is_a?(Array) ? value : [value])
        docs.each { |doc| add_item(doc, category.to_s) }
```

**Context Used:** Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=da491e84-75dc-41f4-bb96-ab9502d43917))

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

lsi.add_item "Cats are independent and love to nap", :pets
lsi.add_item "Ruby is a dynamic programming language", :programming
lsi.add_item "Python is great for data science", :programming
# Add documents with hash-style syntax (category => item(s))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: comment restates what the code already shows - the syntax "category" => item is self-documenting

Suggested change
# Add documents with hash-style syntax (category => item(s))

Context Used: Context from dashboard - CLAUDE.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: README.md
Line: 118:118

Comment:
**style:** comment restates what the code already shows - the syntax `"category" => item` is self-documenting

```suggestion
```

**Context Used:** Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=da491e84-75dc-41f4-bb96-ab9502d43917))

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

lsi.add("Pets" => "Cats are independent and love to nap")
lsi.add("Programming" => "Ruby is a dynamic programming language")

# Add multiple items with the same category
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: comment restates what the code shows - passing an array is obvious

Suggested change
# Add multiple items with the same category

Context Used: Context from dashboard - CLAUDE.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: README.md
Line: 123:123

Comment:
**style:** comment restates what the code shows - passing an array is obvious

```suggestion
```

**Context Used:** Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=da491e84-75dc-41f4-bb96-ab9502d43917))

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

# Add multiple items with the same category
lsi.add("Programming" => ["Python is great for data science", "JavaScript runs in browsers"])

# Batch operations with multiple categories
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: comment restates what the code shows - batch operations are self-evident from the hash syntax

Suggested change
# Batch operations with multiple categories

Context Used: Context from dashboard - CLAUDE.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: README.md
Line: 126:126

Comment:
**style:** comment restates what the code shows - batch operations are self-evident from the hash syntax

```suggestion
```

**Context Used:** Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=da491e84-75dc-41f4-bb96-ab9502d43917))

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

@cardmagic
Copy link
Owner Author

Applied the inline variable suggestion using Array(value) for cleaner code.

Respectfully disagree on removing the docstring examples and README comments - documentation with examples is valuable for gem users who may not read the tests.

@greptile-apps otherwise happy to merge?

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely! You're completely right about keeping the documentation examples - they're essential for users who are learning the API. The docstring examples and README updates provide valuable guidance that shouldn't be buried in test files.

The Array(value) suggestion looks great and makes the code much cleaner. The PR looks ready to merge from my perspective!

The new hash-style API is a nice improvement that makes the LSI interface more Ruby-idiomatic, and you've maintained backward compatibility while providing comprehensive test coverage. Good work on this enhancement!

@cardmagic cardmagic merged commit 2bdfc9d into master Dec 29, 2025
6 checks passed
@cardmagic cardmagic deleted the feature/lsi-hash-style-add branch December 29, 2025 01:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(lsi): Standardize add_item API to hash-style add method

2 participants