Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ruby.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
ruby-version: ['3.2', '3.3', '3.4', '4.0']
ruby-version: ['3.3', '3.4', '4.0']

steps:
- uses: actions/checkout@v4
Expand Down
1 change: 1 addition & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ gem 'mutex_m'
gem 'ostruct'

group :test do
gem 'rantly', require: false
gem 'simplecov', require: false
end

Expand Down
2 changes: 2 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ GEM
rake (13.3.1)
rake-compiler (1.3.1)
rake
rantly (3.0.0)
rb-fsevent (0.11.2)
rb-inotify (0.11.1)
ffi (~> 1.0)
Expand Down Expand Up @@ -140,6 +141,7 @@ DEPENDENCIES
mutex_m
ostruct
rake-compiler
rantly
rbs-inline
rdoc
rubocop
Expand Down
22 changes: 15 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,19 +115,27 @@ require 'classifier'

lsi = Classifier::LSI.new

# Add documents with categories
lsi.add_item "Dogs are loyal pets that love to play fetch", :pets
lsi.add_item "Cats are independent and love to nap", :pets
lsi.add_item "Ruby is a dynamic programming language", :programming
lsi.add_item "Python is great for data science", :programming
# Add documents with hash-style syntax (category => item(s))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: comment restates what the code already shows - the syntax "category" => item is self-documenting

Suggested change
# Add documents with hash-style syntax (category => item(s))

Context Used: Context from dashboard - CLAUDE.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: README.md
Line: 118:118

Comment:
**style:** comment restates what the code already shows - the syntax `"category" => item` is self-documenting

```suggestion
```

**Context Used:** Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=da491e84-75dc-41f4-bb96-ab9502d43917))

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

lsi.add("Pets" => "Dogs are loyal pets that love to play fetch")
lsi.add("Pets" => "Cats are independent and love to nap")
lsi.add("Programming" => "Ruby is a dynamic programming language")

# Add multiple items with the same category
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: comment restates what the code shows - passing an array is obvious

Suggested change
# Add multiple items with the same category

Context Used: Context from dashboard - CLAUDE.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: README.md
Line: 123:123

Comment:
**style:** comment restates what the code shows - passing an array is obvious

```suggestion
```

**Context Used:** Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=da491e84-75dc-41f4-bb96-ab9502d43917))

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

lsi.add("Programming" => ["Python is great for data science", "JavaScript runs in browsers"])

# Batch operations with multiple categories
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: comment restates what the code shows - batch operations are self-evident from the hash syntax

Suggested change
# Batch operations with multiple categories

Context Used: Context from dashboard - CLAUDE.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: README.md
Line: 126:126

Comment:
**style:** comment restates what the code shows - batch operations are self-evident from the hash syntax

```suggestion
```

**Context Used:** Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=da491e84-75dc-41f4-bb96-ab9502d43917))

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

lsi.add(
"Pets" => ["Hamsters are small furry pets", "Birds can be great companions"],
"Programming" => "Go is fast and concurrent"
)

# Classify new text
lsi.classify "My puppy loves to run around"
# => :pets
# => "Pets"

# Get classification with confidence score
lsi.classify_with_confidence "Learning to code in Ruby"
# => [:programming, 0.89]
# => ["Programming", 0.89]
```

### Search and Discovery
Expand Down
27 changes: 27 additions & 0 deletions lib/classifier/lsi.rb
Original file line number Diff line number Diff line change
Expand Up @@ -122,12 +122,39 @@ def singular_value_spectrum
end
end

# Adds items to the index using hash-style syntax.
# The hash keys are categories, and values are items (or arrays of items).
#
# For example:
# lsi = Classifier::LSI.new
# lsi.add("Dog" => "Dogs are loyal pets")
# lsi.add("Cat" => "Cats are independent")
# lsi.add(Bird: "Birds can fly") # Symbol keys work too
#
# Multiple items with the same category:
# lsi.add("Dog" => ["Dogs are loyal", "Puppies are cute"])
#
# Batch operations with multiple categories:
# lsi.add(
# "Dog" => ["Dogs are loyal", "Puppies are cute"],
# "Cat" => ["Cats are independent", "Kittens are playful"]
# )
#
Comment on lines +125 to +142
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: docstring repeats what method signature and usage already shows - all the examples can be understood from the method name and tests

Suggested change
# Adds items to the index using hash-style syntax.
# The hash keys are categories, and values are items (or arrays of items).
#
# For example:
# lsi = Classifier::LSI.new
# lsi.add("Dog" => "Dogs are loyal pets")
# lsi.add("Cat" => "Cats are independent")
# lsi.add(Bird: "Birds can fly") # Symbol keys work too
#
# Multiple items with the same category:
# lsi.add("Dog" => ["Dogs are loyal", "Puppies are cute"])
#
# Batch operations with multiple categories:
# lsi.add(
# "Dog" => ["Dogs are loyal", "Puppies are cute"],
# "Cat" => ["Cats are independent", "Kittens are playful"]
# )
#
# @rbs (**untyped items) -> void

Context Used: Context from dashboard - CLAUDE.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/lsi.rb
Line: 125:142

Comment:
**style:** docstring repeats what method signature and usage already shows - all the examples can be understood from the method name and tests

```suggestion
    # @rbs (**untyped items) -> void
```

**Context Used:** Context from `dashboard` - CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=da491e84-75dc-41f4-bb96-ab9502d43917))

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

# @rbs (**untyped items) -> void
def add(**items)
items.each do |category, value|
Array(value).each { |doc| add_item(doc, category.to_s) }
end
end

# Adds an item to the index. item is assumed to be a string, but
# any item may be indexed so long as it responds to #to_s or if
# you provide an optional block explaining how the indexer can
# fetch fresh string data. This optional block is passed the item,
# so the item may only be a reference to a URL or file name.
#
# @deprecated Use {#add} instead for clearer hash-style syntax.
#
# For example:
# lsi = Classifier::LSI.new
# lsi.add_item "This is just plain text"
Expand Down
104 changes: 104 additions & 0 deletions test/lsi/lsi_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,110 @@ def setup
@str5 = 'This text involves birds. Birds.'
end

# Hash-style add API tests (Issue #100)

def test_add_with_hash_syntax
lsi = Classifier::LSI.new
lsi.add('Dog' => 'Dogs are loyal pets')
lsi.add('Cat' => 'Cats are independent')

assert_equal 2, lsi.items.size
assert_includes lsi.items, 'Dogs are loyal pets'
assert_includes lsi.items, 'Cats are independent'
end

def test_add_with_symbol_keys
lsi = Classifier::LSI.new
lsi.add(Dog: 'Dogs are loyal', Cat: 'Cats are independent')

assert_equal 2, lsi.items.size
assert_equal ['Dog'], lsi.categories_for('Dogs are loyal')
assert_equal ['Cat'], lsi.categories_for('Cats are independent')
end

def test_add_multiple_items_same_category
lsi = Classifier::LSI.new
lsi.add('Dog' => ['Dogs are loyal', 'Puppies are cute', 'Canines are friendly'])

assert_equal 3, lsi.items.size
assert_equal ['Dog'], lsi.categories_for('Dogs are loyal')
assert_equal ['Dog'], lsi.categories_for('Puppies are cute')
assert_equal ['Dog'], lsi.categories_for('Canines are friendly')
end

def test_add_batch_operations
lsi = Classifier::LSI.new
lsi.add(
'Dog' => ['Dogs are loyal', 'Puppies are cute'],
'Cat' => ['Cats are independent', 'Kittens are playful']
)

assert_equal 4, lsi.items.size
assert_equal ['Dog'], lsi.categories_for('Dogs are loyal')
assert_equal ['Cat'], lsi.categories_for('Cats are independent')
end

def test_add_classification_works
lsi = Classifier::LSI.new
lsi.add(
'Dog' => @str2,
'Cat' => [@str3, @str4],
'Bird' => @str5
)

assert_equal 'Dog', lsi.classify(@str1)
assert_equal 'Cat', lsi.classify(@str3)
assert_equal 'Bird', lsi.classify(@str5)
end

def test_add_find_related_works
lsi = Classifier::LSI.new
lsi.add(
'Dog' => [@str1, @str2],
'Cat' => [@str3, @str4],
'Bird' => @str5
)

# The closest match to str1 should be str2 (both about dogs)
related = lsi.find_related(@str1, 3)

assert_equal @str2, related.first, 'Most related to dog text should be other dog text'
end

def test_add_equivalence_to_add_item
# Using add
lsi1 = Classifier::LSI.new
lsi1.add(
'Programming' => ['Ruby programming language', 'Java enterprise development'],
'Entertainment' => 'Cat pictures are cute'
)

# Using add_item (legacy)
lsi2 = Classifier::LSI.new
lsi2.add_item 'Ruby programming language', 'Programming'
lsi2.add_item 'Java enterprise development', 'Programming'
lsi2.add_item 'Cat pictures are cute', 'Entertainment'

# Both should classify the same
test_text = 'Python programming'

assert_equal lsi1.classify(test_text), lsi2.classify(test_text)
end

def test_add_triggers_auto_rebuild
lsi = Classifier::LSI.new auto_rebuild: true
lsi.add('Dog' => ['Dogs are great', 'More about dogs'])

refute_predicate lsi, :needs_rebuild?, 'Auto-rebuild should keep index current'
end

def test_add_respects_auto_rebuild_false
lsi = Classifier::LSI.new auto_rebuild: false
lsi.add('Dog' => ['Dogs are great', 'More about dogs'])

assert_predicate lsi, :needs_rebuild?, 'Index should need rebuild when auto_rebuild is false'
end

def test_basic_indexing
lsi = Classifier::LSI.new
[@str1, @str2, @str3, @str4, @str5].each { |x| lsi << x }
Expand Down
Loading