Skip to content

Conversation

@cardmagic
Copy link
Owner

Summary

  • Implements k-Nearest Neighbors (kNN) classifier leveraging existing LSI infrastructure
  • Hash-style API consistent with Bayes and LSI (add, classify)
  • classify_with_neighbors() returns interpretable results with neighbor details
  • Distance-weighted voting option (weighted: true)
  • Full JSON/Marshal serialization and storage backend support
  • 44 tests with 96.2% line coverage on knn.rb

Usage

knn = Classifier::KNN.new(k: 3, weighted: true)

knn.add(
  spam: ["Buy now!", "Free money!"],
  ham: ["Meeting tomorrow", "Project update"]
)

knn.classify "Special offer!"
# => "spam"

result = knn.classify_with_neighbors "Claim your prize"
# => {category: "spam", neighbors: [...], votes: {...}, confidence: 0.85}

When to Use kNN vs Bayes vs LSI

Classifier Best For
Bayes Large training sets, fast classification, spam filtering
LSI Semantic similarity, document clustering, search
kNN Small datasets, interpretable results, incremental learning

Test plan

  • All 291 tests pass
  • 96.8% overall line coverage
  • kNN-specific tests: 44 tests, 120 assertions
  • Edge cases: single item, empty classifier, k > items

Closes #103

@cardmagic cardmagic self-assigned this Dec 29, 2025
Implements a kNN classifier that leverages existing LSI infrastructure
for similarity computations. Unlike Bayes which requires training, kNN
uses instance-based learning—store examples and classify by finding
the most similar ones.

Key features:
- Hash-style API consistent with Bayes and LSI (add, classify)
- classify_with_neighbors() returns interpretable results with neighbor
  details, vote tallies, and confidence scores
- Distance-weighted voting option for more nuanced classification
- Full JSON/Marshal serialization and storage backend support
- Handles edge cases like single-item classifiers gracefully

This completes Issue #103 and provides a third classification algorithm
suited for small datasets where interpretability matters.

Closes #103
@cardmagic cardmagic force-pushed the feature/knn-classifier branch from dc53ba2 to 3f95b1b Compare December 29, 2025 02:13
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 29, 2025

Greptile Summary

Implements k-Nearest Neighbors classifier by leveraging existing LSI infrastructure for similarity calculations. Uses instance-based learning where examples are stored and classification happens by finding the k most similar items and voting by their categories.

Major Changes

  • New Classifier::KNN class (lib/classifier/knn.rb): Complete kNN implementation with hash-style API matching existing Bayes/LSI patterns
  • LSI edge case fix (lib/classifier/lsi.rb:632): Added single-item handling to proximity_array_for_content_unlocked to prevent errors when only one training example exists
  • Comprehensive test coverage: 44 tests with excellent coverage of classification, serialization, storage, and edge cases
  • Documentation: Added extensive README section with usage examples, feature comparison table, and guidance on when to use each classifier

Key Features

  • Distance-weighted voting option for more accurate classification
  • Full serialization support (JSON and Marshal)
  • Storage backend integration with dirty tracking
  • Thread-safe with mutex synchronization
  • Interpretable results via classify_with_neighbors showing which examples influenced the decision

Architecture

The implementation cleanly delegates to LSI for similarity calculations (proximity_array_for_content), keeping kNN focused on the voting logic and result structuring. This reuse of existing infrastructure minimizes new code while providing powerful functionality.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • Well-designed implementation with comprehensive test coverage (44 tests, 96.8% overall line coverage), proper error handling, thread safety via mutex, and clean integration with existing LSI infrastructure. The code follows established patterns in the codebase and includes edge case handling.
  • No files require special attention

Important Files Changed

Filename Overview
lib/classifier/knn.rb New k-Nearest Neighbors classifier implementation with LSI integration, serialization support, and comprehensive API - well-structured with proper error handling
lib/classifier/lsi.rb Added edge case handling for single-item proximity calculation - returns similarity of 1.0 when only one item exists
test/knn/knn_test.rb Comprehensive test suite with 44 tests covering initialization, classification, serialization, storage, and edge cases

Sequence Diagram

sequenceDiagram
    participant User
    participant KNN
    participant LSI
    participant Matrix

    Note over User,Matrix: Adding Training Data
    User->>KNN: add(spam: ["Buy now!"], ham: ["Meeting"])
    KNN->>KNN: synchronize { @dirty = true }
    KNN->>LSI: add(**items)
    LSI->>LSI: Store items with categories
    LSI->>LSI: Mark for rebuild if auto_rebuild
    LSI-->>KNN: Items stored

    Note over User,Matrix: Classification Flow
    User->>KNN: classify("Free offer!")
    KNN->>KNN: classify_with_neighbors(text)
    KNN->>KNN: synchronize block
    
    alt Empty classifier
        KNN-->>User: nil
    else Has items
        KNN->>KNN: find_neighbors(text)
        KNN->>LSI: proximity_array_for_content(text)
        LSI->>LSI: synchronize { proximity_array_for_content_unlocked }
        
        alt Single item
            LSI-->>KNN: [[item, 1.0]]
        else Multiple items
            LSI->>LSI: node_for_content_unlocked(text)
            LSI->>Matrix: Compute dot products with all items
            Matrix-->>LSI: Similarity scores
            LSI->>LSI: Sort by similarity descending
            LSI-->>KNN: [[item1, 0.92], [item2, 0.75], ...]
        end
        
        KNN->>KNN: Select top k neighbors
        KNN->>LSI: categories_for(each item)
        LSI-->>KNN: Category labels
        
        KNN->>KNN: tally_votes(neighbors)
        alt Weighted voting
            KNN->>KNN: votes[cat] += similarity
        else Unweighted voting
            KNN->>KNN: votes[cat] += 1.0
        end
        
        KNN->>KNN: Calculate winner & confidence
        KNN-->>User: {category: "spam", neighbors: [...], votes: {...}, confidence: 0.85}
    end

    Note over User,Matrix: Persistence
    User->>KNN: save_to_file(path)
    KNN->>KNN: to_json
    KNN->>KNN: as_json
    KNN->>LSI: as_json
    LSI-->>KNN: LSI state as hash
    KNN->>KNN: Write JSON to file
    KNN->>KNN: @dirty = false
    KNN-->>User: Bytes written

    User->>KNN: load_from_file(path)
    KNN->>KNN: from_json(File.read)
    KNN->>LSI: from_json(lsi_data)
    LSI-->>KNN: Restored LSI instance
    KNN-->>User: Restored KNN instance
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 35 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 12 to 22
# This class implements a k-Nearest Neighbors classifier that leverages
# the existing LSI infrastructure for similarity computations.
#
# Unlike traditional classifiers that require training, kNN uses instance-based
# learning - it stores examples and classifies by finding the most similar ones.
#
# Example usage:
# knn = Classifier::KNN.new(k: 3)
# knn.add("spam" => ["Buy now!", "Limited offer!"])
# knn.add("ham" => ["Meeting tomorrow", "Project update"])
# knn.classify("Special discount!") # => "spam"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: docstring restates obvious implementation details

Suggested change
# This class implements a k-Nearest Neighbors classifier that leverages
# the existing LSI infrastructure for similarity computations.
#
# Unlike traditional classifiers that require training, kNN uses instance-based
# learning - it stores examples and classifies by finding the most similar ones.
#
# Example usage:
# knn = Classifier::KNN.new(k: 3)
# knn.add("spam" => ["Buy now!", "Limited offer!"])
# knn.add("ham" => ["Meeting tomorrow", "Project update"])
# knn.classify("Special discount!") # => "spam"
# k-Nearest Neighbors classifier using LSI for similarity.
#
# Example:
# knn = Classifier::KNN.new(k: 3)
# knn.add(spam: ["Buy now!"], ham: ["Meeting tomorrow"])
# knn.classify("Special discount!") # => "spam"

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 12:22

Comment:
**style:** docstring restates obvious implementation details

```suggestion
  # k-Nearest Neighbors classifier using LSI for similarity.
  #
  # Example:
  #   knn = Classifier::KNN.new(k: 3)
  #   knn.add(spam: ["Buy now!"], ham: ["Meeting tomorrow"])
  #   knn.classify("Special discount!") # => "spam"
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 36 to 40
# Creates a new kNN classifier.
#
# @param k [Integer] Number of neighbors to consider (default: 5)
# @param weighted [Boolean] Use distance-weighted voting (default: false)
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: docstring repeats parameter types already in @rbs

Suggested change
# Creates a new kNN classifier.
#
# @param k [Integer] Number of neighbors to consider (default: 5)
# @param weighted [Boolean] Use distance-weighted voting (default: false)
#
# @rbs (?k: Integer, ?weighted: bool) -> void

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 36:40

Comment:
**style:** docstring repeats parameter types already in `@rbs`

```suggestion
    # @rbs (?k: Integer, ?weighted: bool) -> void
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 51 to 67

# Adds labeled examples to the classifier using hash-style syntax.
# Keys are categories, values are items (or arrays of items).
#
# @example Single item per category
# knn.add("spam" => "Buy now!")
# knn.add("ham" => "Meeting tomorrow")
#
# @example Multiple items per category
# knn.add("spam" => ["Buy now!", "Limited offer!"])
#
# @example Batch operations
# knn.add(
# "spam" => ["Buy now!", "Limited offer!"],
# "ham" => ["Meeting tomorrow", "Project update"]
# )
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: docstring examples repeat what method signature already shows

Suggested change
# Adds labeled examples to the classifier using hash-style syntax.
# Keys are categories, values are items (or arrays of items).
#
# @example Single item per category
# knn.add("spam" => "Buy now!")
# knn.add("ham" => "Meeting tomorrow")
#
# @example Multiple items per category
# knn.add("spam" => ["Buy now!", "Limited offer!"])
#
# @example Batch operations
# knn.add(
# "spam" => ["Buy now!", "Limited offer!"],
# "ham" => ["Meeting tomorrow", "Project update"]
# )
#
# @rbs (**untyped items) -> void

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 51:67

Comment:
**style:** docstring examples repeat what method signature already shows

```suggestion
    # @rbs (**untyped items) -> void
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 87 to 92
# Classifies the given text by finding the k nearest neighbors
# and using majority voting.
#
# @param text [String] The text to classify
# @return [String, Symbol, nil] The predicted category, or nil if no examples exist
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: docstring restates method name and signature

Suggested change
# Classifies the given text by finding the k nearest neighbors
# and using majority voting.
#
# @param text [String] The text to classify
# @return [String, Symbol, nil] The predicted category, or nil if no examples exist
#
# @rbs (String) -> (String | Symbol)?

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 87:92

Comment:
**style:** docstring restates method name and signature

```suggestion
    # @rbs (String) -> (String | Symbol)?
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 99 to 108
# Classifies the given text and returns detailed information about
# the neighbors that contributed to the decision.
#
# @param text [String] The text to classify
# @return [Hash] A hash containing:
# - :category - The predicted category
# - :neighbors - Array of neighbor details (item, category, similarity)
# - :votes - Hash of category => vote count/weight
# - :confidence - Confidence score (winning vote share)
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: docstring restates return hash keys visible in code

Suggested change
# Classifies the given text and returns detailed information about
# the neighbors that contributed to the decision.
#
# @param text [String] The text to classify
# @return [Hash] A hash containing:
# - :category - The predicted category
# - :neighbors - Array of neighbor details (item, category, similarity)
# - :votes - Hash of category => vote count/weight
# - :confidence - Confidence score (winning vote share)
#
# Returns category, neighbors, votes, and confidence score.
#
# @rbs (String) -> Hash[Symbol, untyped]

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 99:108

Comment:
**style:** docstring restates return hash keys visible in code

```suggestion
    # Returns category, neighbors, votes, and confidence score.
    #
    # @rbs (String) -> Hash[Symbol, untyped]
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +352 to +354
synchronize do
@k = data['k']
@weighted = data['weighted']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: docstring repeats method name and obvious behavior

Suggested change
synchronize do
@k = data['k']
@weighted = data['weighted']
# @rbs (Array[Hash[Symbol, untyped]]) -> Hash[String | Symbol, Float]

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 352:354

Comment:
**style:** docstring repeats method name and obvious behavior

```suggestion
    # @rbs (Array[Hash[Symbol, untyped]]) -> Hash[String | Symbol, Float]
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 110 to 128
def classify_with_neighbors(text)
synchronize do
return empty_result if @lsi.items.empty?

neighbors = find_neighbors(text)
return empty_result if neighbors.empty?

votes = tally_votes(neighbors)
winner = votes.max_by { |_, v| v }&.first
total_votes = votes.values.sum
confidence = winner && total_votes.positive? ? votes[winner] / total_votes.to_f : 0.0

{
category: winner,
neighbors: neighbors,
votes: votes,
confidence: confidence
}
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: nested conditionals - use early returns

Suggested change
def classify_with_neighbors(text)
synchronize do
return empty_result if @lsi.items.empty?
neighbors = find_neighbors(text)
return empty_result if neighbors.empty?
votes = tally_votes(neighbors)
winner = votes.max_by { |_, v| v }&.first
total_votes = votes.values.sum
confidence = winner && total_votes.positive? ? votes[winner] / total_votes.to_f : 0.0
{
category: winner,
neighbors: neighbors,
votes: votes,
confidence: confidence
}
end
def classify_with_neighbors(text)
synchronize do
return empty_result if @lsi.items.empty?
neighbors = find_neighbors(text)
return empty_result if neighbors.empty?
votes = tally_votes(neighbors)
winner = votes.max_by { |_, v| v }&.first
return empty_result unless winner
total_votes = votes.values.sum
confidence = total_votes.positive? ? votes[winner] / total_votes.to_f : 0.0
{
category: winner,
neighbors: neighbors,
votes: votes,
confidence: confidence
}
end
end

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 110:128

Comment:
**style:** nested conditionals - use early returns

```suggestion
    def classify_with_neighbors(text)
      synchronize do
        return empty_result if @lsi.items.empty?

        neighbors = find_neighbors(text)
        return empty_result if neighbors.empty?

        votes = tally_votes(neighbors)
        winner = votes.max_by { |_, v| v }&.first
        return empty_result unless winner

        total_votes = votes.values.sum
        confidence = total_votes.positive? ? votes[winner] / total_votes.to_f : 0.0

        {
          category: winner,
          neighbors: neighbors,
          votes: votes,
          confidence: confidence
        }
      end
    end
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 246 to 256

# @rbs () -> bool
def dirty?
@dirty
end

# Loads a classifier from configured storage.
#
# @param storage [Storage::Base] The storage to load from
# @return [KNN] The loaded classifier
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: nested conditionals - use early returns

Suggested change
# @rbs () -> bool
def dirty?
@dirty
end
# Loads a classifier from configured storage.
#
# @param storage [Storage::Base] The storage to load from
# @return [KNN] The loaded classifier
#
def reload
raise ArgumentError, 'No storage configured' unless storage
raise UnsavedChangesError, 'Unsaved changes would be lost. Call save first or use reload!' if @dirty
data = storage.read
raise StorageError, 'No saved state found' unless data
restore_from_json(data)
@dirty = false
self
end

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 246:256

Comment:
**style:** nested conditionals - use early returns

```suggestion
    def reload
      raise ArgumentError, 'No storage configured' unless storage
      raise UnsavedChangesError, 'Unsaved changes would be lost. Call save first or use reload!' if @dirty

      data = storage.read
      raise StorageError, 'No saved state found' unless data

      restore_from_json(data)
      @dirty = false
      self
    end
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 261 to 270

instance = from_json(data)
instance.storage = storage
instance
end

# Loads a classifier from a file.
#
# @param path [String] The file path
# @return [KNN] The loaded classifier
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: nested conditionals - use early returns

Suggested change
instance = from_json(data)
instance.storage = storage
instance
end
# Loads a classifier from a file.
#
# @param path [String] The file path
# @return [KNN] The loaded classifier
def reload!
raise ArgumentError, 'No storage configured' unless storage
data = storage.read
raise StorageError, 'No saved state found' unless data
restore_from_json(data)
@dirty = false
self
end

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 261:270

Comment:
**style:** nested conditionals - use early returns

```suggestion
    def reload!
      raise ArgumentError, 'No storage configured' unless storage

      data = storage.read
      raise StorageError, 'No saved state found' unless data

      restore_from_json(data)
      @dirty = false
      self
    end
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 285 to 292
@k, @weighted, @lsi, @dirty = data
@storage = nil
end

private

# Finds the k nearest neighbors for the given text.
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: nested conditionals - use early returns

Suggested change
@k, @weighted, @lsi, @dirty = data
@storage = nil
end
private
# Finds the k nearest neighbors for the given text.
#
def self.load(storage:)
data = storage.read
raise StorageError, 'No saved state found' unless data
instance = from_json(data)
instance.storage = storage
instance
end

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 285:292

Comment:
**style:** nested conditionals - use early returns

```suggestion
    def self.load(storage:)
      data = storage.read
      raise StorageError, 'No saved state found' unless data

      instance = from_json(data)
      instance.storage = storage
      instance
    end
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 9 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 295 to 304
# LSI requires at least 2 items to build an index
# For single item, return it directly with a default similarity
if @lsi.items.size == 1
item = @lsi.items.first
return [{
item: item,
category: @lsi.categories_for(item).first,
similarity: 1.0
}]
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: nested conditional with defensive code for single-item case. LSI already handles this - remove the special case.

Suggested change
# LSI requires at least 2 items to build an index
# For single item, return it directly with a default similarity
if @lsi.items.size == 1
item = @lsi.items.first
return [{
item: item,
category: @lsi.categories_for(item).first,
similarity: 1.0
}]
end
def find_neighbors(text)
proximity = @lsi.proximity_array_for_content(text)
neighbors = proximity.reject { |item, _| item == text }.first(@k)
neighbors.map do |item, similarity|
{
item: item,
category: @lsi.categories_for(item).first,
similarity: similarity
}
end
end

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 295:304

Comment:
**style:** nested conditional with defensive code for single-item case. LSI already handles this - remove the special case.

```suggestion
    def find_neighbors(text)
      proximity = @lsi.proximity_array_for_content(text)
      neighbors = proximity.reject { |item, _| item == text }.first(@k)

      neighbors.map do |item, similarity|
        {
          item: item,
          category: @lsi.categories_for(item).first,
          similarity: similarity
        }
      end
    end
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

#
# @rbs () -> void
def save
raise ArgumentError, 'No storage configured. Use save_to_file(path) or set storage=' unless storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: defensive error for impossible scenario - if storage is nil, the method already can't be called. Remove this check.

Suggested change
raise ArgumentError, 'No storage configured. Use save_to_file(path) or set storage=' unless storage
storage.write(to_json)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 200:200

Comment:
**style:** defensive error for impossible scenario - if storage is nil, the method already can't be called. Remove this check.

```suggestion
      storage.write(to_json)
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +222 to +223
raise ArgumentError, 'No storage configured' unless storage
raise UnsavedChangesError, 'Unsaved changes would be lost. Call save first or use reload!' if @dirty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: nested error checks - use early returns

Suggested change
raise ArgumentError, 'No storage configured' unless storage
raise UnsavedChangesError, 'Unsaved changes would be lost. Call save first or use reload!' if @dirty
def reload
raise ArgumentError, 'No storage configured' unless storage
raise UnsavedChangesError, 'Unsaved changes would be lost. Call save first or use reload!' if @dirty
data = storage.read
raise StorageError, 'No saved state found' unless data
restore_from_json(data)
@dirty = false
self
end

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 222:223

Comment:
**style:** nested error checks - use early returns

```suggestion
    def reload
      raise ArgumentError, 'No storage configured' unless storage
      raise UnsavedChangesError, 'Unsaved changes would be lost. Call save first or use reload!' if @dirty

      data = storage.read
      raise StorageError, 'No saved state found' unless data

      restore_from_json(data)
      @dirty = false
      self
    end
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +237 to +240
raise ArgumentError, 'No storage configured' unless storage

data = storage.read
raise StorageError, 'No saved state found' unless data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: nested error checks - use early returns

Suggested change
raise ArgumentError, 'No storage configured' unless storage
data = storage.read
raise StorageError, 'No saved state found' unless data
def reload!
raise ArgumentError, 'No storage configured' unless storage
data = storage.read
raise StorageError, 'No saved state found' unless data
restore_from_json(data)
@dirty = false
self
end

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 237:240

Comment:
**style:** nested error checks - use early returns

```suggestion
    def reload!
      raise ArgumentError, 'No storage configured' unless storage

      data = storage.read
      raise StorageError, 'No saved state found' unless data

      restore_from_json(data)
      @dirty = false
      self
    end
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +259 to +260
data = storage.read
raise StorageError, 'No saved state found' unless data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: nested error check - use early return

Suggested change
data = storage.read
raise StorageError, 'No saved state found' unless data
def self.load(storage:)
data = storage.read
raise StorageError, 'No saved state found' unless data
instance = from_json(data)
instance.storage = storage
instance
end

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 259:260

Comment:
**style:** nested error check - use early return

```suggestion
    def self.load(storage:)
      data = storage.read
      raise StorageError, 'No saved state found' unless data

      instance = from_json(data)
      instance.storage = storage
      instance
    end
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 352 to 360
synchronize do
@k = data['k']
@weighted = data['weighted']

lsi_data = data['lsi']
lsi_data['type'] = 'lsi'
@lsi = LSI.from_json(lsi_data)
@dirty = false
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: synchronize block wraps entire method body - move the guard clause outside

Suggested change
synchronize do
@k = data['k']
@weighted = data['weighted']
lsi_data = data['lsi']
lsi_data['type'] = 'lsi'
@lsi = LSI.from_json(lsi_data)
@dirty = false
end
def restore_from_json(json)
data = JSON.parse(json)
raise ArgumentError, "Invalid classifier type: #{data['type']}" unless data['type'] == 'knn'
synchronize do
@k = data['k']
@weighted = data['weighted']
lsi_data = data['lsi']
lsi_data['type'] = 'lsi'
@lsi = LSI.from_json(lsi_data)
@dirty = false
end
end

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 352:360

Comment:
**style:** synchronize block wraps entire method body - move the guard clause outside

```suggestion
    def restore_from_json(json)
      data = JSON.parse(json)
      raise ArgumentError, "Invalid classifier type: #{data['type']}" unless data['type'] == 'knn'

      synchronize do
        @k = data['k']
        @weighted = data['weighted']

        lsi_data = data['lsi']
        lsi_data['type'] = 'lsi'
        @lsi = LSI.from_json(lsi_data)
        @dirty = false
      end
    end
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 186 to 188
# Restore the LSI from its nested data
lsi_data = data['lsi']
lsi_data['type'] = 'lsi' # Ensure type is set for LSI.from_json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: unnecessary mutation of passed data - work with local copy

Suggested change
# Restore the LSI from its nested data
lsi_data = data['lsi']
lsi_data['type'] = 'lsi' # Ensure type is set for LSI.from_json
lsi_data = data['lsi'].dup
lsi_data['type'] = 'lsi'
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 186:188

Comment:
**logic:** unnecessary mutation of passed data - work with local copy

```suggestion
      lsi_data = data['lsi'].dup
      lsi_data['type'] = 'lsi'
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 356 to 357
lsi_data = data['lsi']
lsi_data['type'] = 'lsi'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: unnecessary mutation of parsed data - work with local copy

Suggested change
lsi_data = data['lsi']
lsi_data['type'] = 'lsi'
lsi_data = data['lsi'].dup
lsi_data['type'] = 'lsi'

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 356:357

Comment:
**logic:** unnecessary mutation of parsed data - work with local copy

```suggestion
        lsi_data = data['lsi'].dup
        lsi_data['type'] = 'lsi'
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 324 to 326
neighbors.each do |neighbor|
category = neighbor[:category]
next unless category
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: guard clause inside loop - extract to method or combine with map logic

Suggested change
neighbors.each do |neighbor|
category = neighbor[:category]
next unless category
neighbors.each do |neighbor|
category = neighbor[:category]
weight = @weighted ? [neighbor[:similarity], 0.0].max : 1.0
votes[category] += weight if category
end

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 324:326

Comment:
**style:** guard clause inside loop - extract to method or combine with map logic

```suggestion
      neighbors.each do |neighbor|
        category = neighbor[:category]
        weight = @weighted ? [neighbor[:similarity], 0.0].max : 1.0
        votes[category] += weight if category
      end
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Remove @param/@return/@example tags that duplicate @RBS type info.
Keep method descriptions that explain behavior.
Remove deprecated add_item method (new class doesn't need legacy API).
@cardmagic
Copy link
Owner Author

@greptile please re-review

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 23 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 137 to 138
lsi_data = data['lsi']
lsi_data['type'] = 'lsi' # Ensure type is set for LSI.from_json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: mutates the passed-in hash by adding 'type' key, violating principle of least surprise

Suggested change
lsi_data = data['lsi']
lsi_data['type'] = 'lsi' # Ensure type is set for LSI.from_json
lsi_data = data['lsi'].dup
lsi_data['type'] = 'lsi'

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 137:138

Comment:
**logic:** mutates the passed-in hash by adding `'type'` key, violating principle of least surprise

```suggestion
      lsi_data = data['lsi'].dup
      lsi_data['type'] = 'lsi'
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 285 to 286
lsi_data = data['lsi']
lsi_data['type'] = 'lsi'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: mutates the parsed hash, same issue as line 137

Suggested change
lsi_data = data['lsi']
lsi_data['type'] = 'lsi'
lsi_data = data['lsi'].dup
lsi_data['type'] = 'lsi'
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 285:286

Comment:
**logic:** mutates the parsed hash, same issue as line 137

```suggestion
        lsi_data = data['lsi'].dup
        lsi_data['type'] = 'lsi'
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 228 to 237
# LSI requires at least 2 items to build an index
# For single item, return it directly with a default similarity
if @lsi.items.size == 1
item = @lsi.items.first
return [{
item: item,
category: @lsi.categories_for(item).first,
similarity: 1.0
}]
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: unnecessary special case - LSI's proximity_array_for_content should handle single-item case internally

If LSI requires this workaround, that's an LSI bug to fix there, not something to paper over here

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 228:237

Comment:
**style:** unnecessary special case - LSI's `proximity_array_for_content` should handle single-item case internally

If LSI requires this workaround, that's an LSI bug to fix there, not something to paper over here

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

category = neighbor[:category]
next unless category

weight = @weighted ? [neighbor[:similarity], 0.0].max : 1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: [neighbor[:similarity], 0.0].max is defensive - similarity from LSI should never be negative. If it is, that's a bug in LSI

Suggested change
weight = @weighted ? [neighbor[:similarity], 0.0].max : 1.0
weight = @weighted ? neighbor[:similarity] : 1.0

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 259:259

Comment:
**style:** `[neighbor[:similarity], 0.0].max` is defensive - similarity from LSI should never be negative. If it is, that's a bug in LSI

```suggestion
        weight = @weighted ? neighbor[:similarity] : 1.0
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 136 to 138
# Restore the LSI from its nested data
lsi_data = data['lsi']
lsi_data['type'] = 'lsi' # Ensure type is set for LSI.from_json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: comment repeats what code shows - the code is self-documenting

Suggested change
# Restore the LSI from its nested data
lsi_data = data['lsi']
lsi_data['type'] = 'lsi' # Ensure type is set for LSI.from_json
lsi_data = data['lsi'].dup
lsi_data['type'] = 'lsi'

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 136:138

Comment:
**style:** comment repeats what code shows - the code is self-documenting

```suggestion
      lsi_data = data['lsi'].dup
      lsi_data['type'] = 'lsi'
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +165 to +174
def reload
raise ArgumentError, 'No storage configured' unless storage
raise UnsavedChangesError, 'Unsaved changes would be lost. Call save first or use reload!' if @dirty

data = storage.read
raise StorageError, 'No saved state found' unless data

restore_from_json(data)
@dirty = false
self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: nested error checks - use early returns

Suggested change
def reload
raise ArgumentError, 'No storage configured' unless storage
raise UnsavedChangesError, 'Unsaved changes would be lost. Call save first or use reload!' if @dirty
data = storage.read
raise StorageError, 'No saved state found' unless data
restore_from_json(data)
@dirty = false
self
def reload
raise ArgumentError, 'No storage configured' unless storage
raise UnsavedChangesError, 'Unsaved changes would be lost. Call save first or use reload!' if @dirty
data = storage.read
raise StorageError, 'No saved state found' unless data
restore_from_json(data)
@dirty = false
self
end

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 165:174

Comment:
**style:** nested error checks - use early returns

```suggestion
    def reload
      raise ArgumentError, 'No storage configured' unless storage
      raise UnsavedChangesError, 'Unsaved changes would be lost. Call save first or use reload!' if @dirty

      data = storage.read
      raise StorageError, 'No saved state found' unless data

      restore_from_json(data)
      @dirty = false
      self
    end
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +179 to +187
def reload!
raise ArgumentError, 'No storage configured' unless storage

data = storage.read
raise StorageError, 'No saved state found' unless data

restore_from_json(data)
@dirty = false
self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: nested error checks - use early returns

Suggested change
def reload!
raise ArgumentError, 'No storage configured' unless storage
data = storage.read
raise StorageError, 'No saved state found' unless data
restore_from_json(data)
@dirty = false
self
def reload!
raise ArgumentError, 'No storage configured' unless storage
data = storage.read
raise StorageError, 'No saved state found' unless data
restore_from_json(data)
@dirty = false
self
end

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 179:187

Comment:
**style:** nested error checks - use early returns

```suggestion
    def reload!
      raise ArgumentError, 'No storage configured' unless storage

      data = storage.read
      raise StorageError, 'No saved state found' unless data

      restore_from_json(data)
      @dirty = false
      self
    end
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +197 to +203
def self.load(storage:)
data = storage.read
raise StorageError, 'No saved state found' unless data

instance = from_json(data)
instance.storage = storage
instance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: nested error check - use early return

Suggested change
def self.load(storage:)
data = storage.read
raise StorageError, 'No saved state found' unless data
instance = from_json(data)
instance.storage = storage
instance
def self.load(storage:)
data = storage.read
raise StorageError, 'No saved state found' unless data
instance = from_json(data)
instance.storage = storage
instance
end

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 197:203

Comment:
**style:** nested error check - use early return

```suggestion
    def self.load(storage:)
      data = storage.read
      raise StorageError, 'No saved state found' unless data

      instance = from_json(data)
      instance.storage = storage
      instance
    end
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 255 to 261
neighbors.each do |neighbor|
category = neighbor[:category]
next unless category

weight = @weighted ? [neighbor[:similarity], 0.0].max : 1.0
votes[category] += weight
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: guard clause inside loop - combine with map logic or extract to method

Suggested change
neighbors.each do |neighbor|
category = neighbor[:category]
next unless category
weight = @weighted ? [neighbor[:similarity], 0.0].max : 1.0
votes[category] += weight
end
neighbors.each do |neighbor|
category = neighbor[:category] or next
weight = @weighted ? neighbor[:similarity] : 1.0
votes[category] += weight
end

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 255:261

Comment:
**style:** guard clause inside loop - combine with map logic or extract to method

```suggestion
      neighbors.each do |neighbor|
        category = neighbor[:category] or next
        weight = @weighted ? neighbor[:similarity] : 1.0
        votes[category] += weight
      end
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 277 to 290
def restore_from_json(json)
data = JSON.parse(json)
raise ArgumentError, "Invalid classifier type: #{data['type']}" unless data['type'] == 'knn'

synchronize do
@k = data['k']
@weighted = data['weighted']

lsi_data = data['lsi']
lsi_data['type'] = 'lsi'
@lsi = LSI.from_json(lsi_data)
@dirty = false
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: synchronize block wraps entire method - move guard clause outside

Suggested change
def restore_from_json(json)
data = JSON.parse(json)
raise ArgumentError, "Invalid classifier type: #{data['type']}" unless data['type'] == 'knn'
synchronize do
@k = data['k']
@weighted = data['weighted']
lsi_data = data['lsi']
lsi_data['type'] = 'lsi'
@lsi = LSI.from_json(lsi_data)
@dirty = false
end
end
def restore_from_json(json)
data = JSON.parse(json)
raise ArgumentError, "Invalid classifier type: #{data['type']}" unless data['type'] == 'knn'
synchronize do
@k = data['k']
@weighted = data['weighted']
lsi_data = data['lsi'].dup
lsi_data['type'] = 'lsi'
@lsi = LSI.from_json(lsi_data)
@dirty = false
end
end

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 277:290

Comment:
**style:** `synchronize` block wraps entire method - move guard clause outside

```suggestion
    def restore_from_json(json)
      data = JSON.parse(json)
      raise ArgumentError, "Invalid classifier type: #{data['type']}" unless data['type'] == 'knn'

      synchronize do
        @k = data['k']
        @weighted = data['weighted']
        lsi_data = data['lsi'].dup
        lsi_data['type'] = 'lsi'
        @lsi = LSI.from_json(lsi_data)
        @dirty = false
      end
    end
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

- Fix LSI single-item bug instead of working around in KNN
- Remove defensive max(0) check on similarity scores
- Use early return for nil winner
- Simplify guard clause with 'or next'
- Condense class docstring
- Avoid mutating input hash with .dup
@cardmagic
Copy link
Owner Author

Addressed the actionable feedback - thanks!

Re: remaining comments - these are intentionally kept:

  1. Method descriptions (e.g., "Classifies text using k nearest neighbors") - these explain what the method does at a glance, useful for API docs
  2. Class docstring with example - essential for users discovering the API
  3. @rbs annotations - these are RBS type signatures, not comments

The removed ones were YARD tags (@param, @return, @example) that duplicated the @rbs type info.

@greptile please re-review

@cardmagic cardmagic merged commit fde311b into master Dec 29, 2025
6 checks passed
@cardmagic cardmagic deleted the feature/knn-classifier branch December 29, 2025 02:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add k-Nearest Neighbors (kNN) classifier

2 participants