-
Notifications
You must be signed in to change notification settings - Fork 123
feat(knn): add k-Nearest Neighbors classifier #105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Implements a kNN classifier that leverages existing LSI infrastructure for similarity computations. Unlike Bayes which requires training, kNN uses instance-based learning—store examples and classify by finding the most similar ones. Key features: - Hash-style API consistent with Bayes and LSI (add, classify) - classify_with_neighbors() returns interpretable results with neighbor details, vote tallies, and confidence scores - Distance-weighted voting option for more nuanced classification - Full JSON/Marshal serialization and storage backend support - Handles edge cases like single-item classifiers gracefully This completes Issue #103 and provides a third classification algorithm suited for small datasets where interpretability matters. Closes #103
dc53ba2 to
3f95b1b
Compare
Greptile SummaryImplements k-Nearest Neighbors classifier by leveraging existing LSI infrastructure for similarity calculations. Uses instance-based learning where examples are stored and classification happens by finding the k most similar items and voting by their categories. Major Changes
Key Features
ArchitectureThe implementation cleanly delegates to LSI for similarity calculations ( Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant KNN
participant LSI
participant Matrix
Note over User,Matrix: Adding Training Data
User->>KNN: add(spam: ["Buy now!"], ham: ["Meeting"])
KNN->>KNN: synchronize { @dirty = true }
KNN->>LSI: add(**items)
LSI->>LSI: Store items with categories
LSI->>LSI: Mark for rebuild if auto_rebuild
LSI-->>KNN: Items stored
Note over User,Matrix: Classification Flow
User->>KNN: classify("Free offer!")
KNN->>KNN: classify_with_neighbors(text)
KNN->>KNN: synchronize block
alt Empty classifier
KNN-->>User: nil
else Has items
KNN->>KNN: find_neighbors(text)
KNN->>LSI: proximity_array_for_content(text)
LSI->>LSI: synchronize { proximity_array_for_content_unlocked }
alt Single item
LSI-->>KNN: [[item, 1.0]]
else Multiple items
LSI->>LSI: node_for_content_unlocked(text)
LSI->>Matrix: Compute dot products with all items
Matrix-->>LSI: Similarity scores
LSI->>LSI: Sort by similarity descending
LSI-->>KNN: [[item1, 0.92], [item2, 0.75], ...]
end
KNN->>KNN: Select top k neighbors
KNN->>LSI: categories_for(each item)
LSI-->>KNN: Category labels
KNN->>KNN: tally_votes(neighbors)
alt Weighted voting
KNN->>KNN: votes[cat] += similarity
else Unweighted voting
KNN->>KNN: votes[cat] += 1.0
end
KNN->>KNN: Calculate winner & confidence
KNN-->>User: {category: "spam", neighbors: [...], votes: {...}, confidence: 0.85}
end
Note over User,Matrix: Persistence
User->>KNN: save_to_file(path)
KNN->>KNN: to_json
KNN->>KNN: as_json
KNN->>LSI: as_json
LSI-->>KNN: LSI state as hash
KNN->>KNN: Write JSON to file
KNN->>KNN: @dirty = false
KNN-->>User: Bytes written
User->>KNN: load_from_file(path)
KNN->>KNN: from_json(File.read)
KNN->>LSI: from_json(lsi_data)
LSI-->>KNN: Restored LSI instance
KNN-->>User: Restored KNN instance
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5 files reviewed, 35 comments
lib/classifier/knn.rb
Outdated
| # This class implements a k-Nearest Neighbors classifier that leverages | ||
| # the existing LSI infrastructure for similarity computations. | ||
| # | ||
| # Unlike traditional classifiers that require training, kNN uses instance-based | ||
| # learning - it stores examples and classifies by finding the most similar ones. | ||
| # | ||
| # Example usage: | ||
| # knn = Classifier::KNN.new(k: 3) | ||
| # knn.add("spam" => ["Buy now!", "Limited offer!"]) | ||
| # knn.add("ham" => ["Meeting tomorrow", "Project update"]) | ||
| # knn.classify("Special discount!") # => "spam" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: docstring restates obvious implementation details
| # This class implements a k-Nearest Neighbors classifier that leverages | |
| # the existing LSI infrastructure for similarity computations. | |
| # | |
| # Unlike traditional classifiers that require training, kNN uses instance-based | |
| # learning - it stores examples and classifies by finding the most similar ones. | |
| # | |
| # Example usage: | |
| # knn = Classifier::KNN.new(k: 3) | |
| # knn.add("spam" => ["Buy now!", "Limited offer!"]) | |
| # knn.add("ham" => ["Meeting tomorrow", "Project update"]) | |
| # knn.classify("Special discount!") # => "spam" | |
| # k-Nearest Neighbors classifier using LSI for similarity. | |
| # | |
| # Example: | |
| # knn = Classifier::KNN.new(k: 3) | |
| # knn.add(spam: ["Buy now!"], ham: ["Meeting tomorrow"]) | |
| # knn.classify("Special discount!") # => "spam" |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 12:22
Comment:
**style:** docstring restates obvious implementation details
```suggestion
# k-Nearest Neighbors classifier using LSI for similarity.
#
# Example:
# knn = Classifier::KNN.new(k: 3)
# knn.add(spam: ["Buy now!"], ham: ["Meeting tomorrow"])
# knn.classify("Special discount!") # => "spam"
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
lib/classifier/knn.rb
Outdated
| # Creates a new kNN classifier. | ||
| # | ||
| # @param k [Integer] Number of neighbors to consider (default: 5) | ||
| # @param weighted [Boolean] Use distance-weighted voting (default: false) | ||
| # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: docstring repeats parameter types already in @rbs
| # Creates a new kNN classifier. | |
| # | |
| # @param k [Integer] Number of neighbors to consider (default: 5) | |
| # @param weighted [Boolean] Use distance-weighted voting (default: false) | |
| # | |
| # @rbs (?k: Integer, ?weighted: bool) -> void |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 36:40
Comment:
**style:** docstring repeats parameter types already in `@rbs`
```suggestion
# @rbs (?k: Integer, ?weighted: bool) -> void
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
lib/classifier/knn.rb
Outdated
|
|
||
| # Adds labeled examples to the classifier using hash-style syntax. | ||
| # Keys are categories, values are items (or arrays of items). | ||
| # | ||
| # @example Single item per category | ||
| # knn.add("spam" => "Buy now!") | ||
| # knn.add("ham" => "Meeting tomorrow") | ||
| # | ||
| # @example Multiple items per category | ||
| # knn.add("spam" => ["Buy now!", "Limited offer!"]) | ||
| # | ||
| # @example Batch operations | ||
| # knn.add( | ||
| # "spam" => ["Buy now!", "Limited offer!"], | ||
| # "ham" => ["Meeting tomorrow", "Project update"] | ||
| # ) | ||
| # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: docstring examples repeat what method signature already shows
| # Adds labeled examples to the classifier using hash-style syntax. | |
| # Keys are categories, values are items (or arrays of items). | |
| # | |
| # @example Single item per category | |
| # knn.add("spam" => "Buy now!") | |
| # knn.add("ham" => "Meeting tomorrow") | |
| # | |
| # @example Multiple items per category | |
| # knn.add("spam" => ["Buy now!", "Limited offer!"]) | |
| # | |
| # @example Batch operations | |
| # knn.add( | |
| # "spam" => ["Buy now!", "Limited offer!"], | |
| # "ham" => ["Meeting tomorrow", "Project update"] | |
| # ) | |
| # | |
| # @rbs (**untyped items) -> void |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 51:67
Comment:
**style:** docstring examples repeat what method signature already shows
```suggestion
# @rbs (**untyped items) -> void
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
lib/classifier/knn.rb
Outdated
| # Classifies the given text by finding the k nearest neighbors | ||
| # and using majority voting. | ||
| # | ||
| # @param text [String] The text to classify | ||
| # @return [String, Symbol, nil] The predicted category, or nil if no examples exist | ||
| # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: docstring restates method name and signature
| # Classifies the given text by finding the k nearest neighbors | |
| # and using majority voting. | |
| # | |
| # @param text [String] The text to classify | |
| # @return [String, Symbol, nil] The predicted category, or nil if no examples exist | |
| # | |
| # @rbs (String) -> (String | Symbol)? |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 87:92
Comment:
**style:** docstring restates method name and signature
```suggestion
# @rbs (String) -> (String | Symbol)?
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
lib/classifier/knn.rb
Outdated
| # Classifies the given text and returns detailed information about | ||
| # the neighbors that contributed to the decision. | ||
| # | ||
| # @param text [String] The text to classify | ||
| # @return [Hash] A hash containing: | ||
| # - :category - The predicted category | ||
| # - :neighbors - Array of neighbor details (item, category, similarity) | ||
| # - :votes - Hash of category => vote count/weight | ||
| # - :confidence - Confidence score (winning vote share) | ||
| # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: docstring restates return hash keys visible in code
| # Classifies the given text and returns detailed information about | |
| # the neighbors that contributed to the decision. | |
| # | |
| # @param text [String] The text to classify | |
| # @return [Hash] A hash containing: | |
| # - :category - The predicted category | |
| # - :neighbors - Array of neighbor details (item, category, similarity) | |
| # - :votes - Hash of category => vote count/weight | |
| # - :confidence - Confidence score (winning vote share) | |
| # | |
| # Returns category, neighbors, votes, and confidence score. | |
| # | |
| # @rbs (String) -> Hash[Symbol, untyped] |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 99:108
Comment:
**style:** docstring restates return hash keys visible in code
```suggestion
# Returns category, neighbors, votes, and confidence score.
#
# @rbs (String) -> Hash[Symbol, untyped]
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| synchronize do | ||
| @k = data['k'] | ||
| @weighted = data['weighted'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: docstring repeats method name and obvious behavior
| synchronize do | |
| @k = data['k'] | |
| @weighted = data['weighted'] | |
| # @rbs (Array[Hash[Symbol, untyped]]) -> Hash[String | Symbol, Float] |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 352:354
Comment:
**style:** docstring repeats method name and obvious behavior
```suggestion
# @rbs (Array[Hash[Symbol, untyped]]) -> Hash[String | Symbol, Float]
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| def classify_with_neighbors(text) | ||
| synchronize do | ||
| return empty_result if @lsi.items.empty? | ||
|
|
||
| neighbors = find_neighbors(text) | ||
| return empty_result if neighbors.empty? | ||
|
|
||
| votes = tally_votes(neighbors) | ||
| winner = votes.max_by { |_, v| v }&.first | ||
| total_votes = votes.values.sum | ||
| confidence = winner && total_votes.positive? ? votes[winner] / total_votes.to_f : 0.0 | ||
|
|
||
| { | ||
| category: winner, | ||
| neighbors: neighbors, | ||
| votes: votes, | ||
| confidence: confidence | ||
| } | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: nested conditionals - use early returns
| def classify_with_neighbors(text) | |
| synchronize do | |
| return empty_result if @lsi.items.empty? | |
| neighbors = find_neighbors(text) | |
| return empty_result if neighbors.empty? | |
| votes = tally_votes(neighbors) | |
| winner = votes.max_by { |_, v| v }&.first | |
| total_votes = votes.values.sum | |
| confidence = winner && total_votes.positive? ? votes[winner] / total_votes.to_f : 0.0 | |
| { | |
| category: winner, | |
| neighbors: neighbors, | |
| votes: votes, | |
| confidence: confidence | |
| } | |
| end | |
| def classify_with_neighbors(text) | |
| synchronize do | |
| return empty_result if @lsi.items.empty? | |
| neighbors = find_neighbors(text) | |
| return empty_result if neighbors.empty? | |
| votes = tally_votes(neighbors) | |
| winner = votes.max_by { |_, v| v }&.first | |
| return empty_result unless winner | |
| total_votes = votes.values.sum | |
| confidence = total_votes.positive? ? votes[winner] / total_votes.to_f : 0.0 | |
| { | |
| category: winner, | |
| neighbors: neighbors, | |
| votes: votes, | |
| confidence: confidence | |
| } | |
| end | |
| end |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 110:128
Comment:
**style:** nested conditionals - use early returns
```suggestion
def classify_with_neighbors(text)
synchronize do
return empty_result if @lsi.items.empty?
neighbors = find_neighbors(text)
return empty_result if neighbors.empty?
votes = tally_votes(neighbors)
winner = votes.max_by { |_, v| v }&.first
return empty_result unless winner
total_votes = votes.values.sum
confidence = total_votes.positive? ? votes[winner] / total_votes.to_f : 0.0
{
category: winner,
neighbors: neighbors,
votes: votes,
confidence: confidence
}
end
end
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
lib/classifier/knn.rb
Outdated
|
|
||
| # @rbs () -> bool | ||
| def dirty? | ||
| @dirty | ||
| end | ||
|
|
||
| # Loads a classifier from configured storage. | ||
| # | ||
| # @param storage [Storage::Base] The storage to load from | ||
| # @return [KNN] The loaded classifier | ||
| # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: nested conditionals - use early returns
| # @rbs () -> bool | |
| def dirty? | |
| @dirty | |
| end | |
| # Loads a classifier from configured storage. | |
| # | |
| # @param storage [Storage::Base] The storage to load from | |
| # @return [KNN] The loaded classifier | |
| # | |
| def reload | |
| raise ArgumentError, 'No storage configured' unless storage | |
| raise UnsavedChangesError, 'Unsaved changes would be lost. Call save first or use reload!' if @dirty | |
| data = storage.read | |
| raise StorageError, 'No saved state found' unless data | |
| restore_from_json(data) | |
| @dirty = false | |
| self | |
| end |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 246:256
Comment:
**style:** nested conditionals - use early returns
```suggestion
def reload
raise ArgumentError, 'No storage configured' unless storage
raise UnsavedChangesError, 'Unsaved changes would be lost. Call save first or use reload!' if @dirty
data = storage.read
raise StorageError, 'No saved state found' unless data
restore_from_json(data)
@dirty = false
self
end
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
lib/classifier/knn.rb
Outdated
|
|
||
| instance = from_json(data) | ||
| instance.storage = storage | ||
| instance | ||
| end | ||
|
|
||
| # Loads a classifier from a file. | ||
| # | ||
| # @param path [String] The file path | ||
| # @return [KNN] The loaded classifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: nested conditionals - use early returns
| instance = from_json(data) | |
| instance.storage = storage | |
| instance | |
| end | |
| # Loads a classifier from a file. | |
| # | |
| # @param path [String] The file path | |
| # @return [KNN] The loaded classifier | |
| def reload! | |
| raise ArgumentError, 'No storage configured' unless storage | |
| data = storage.read | |
| raise StorageError, 'No saved state found' unless data | |
| restore_from_json(data) | |
| @dirty = false | |
| self | |
| end |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 261:270
Comment:
**style:** nested conditionals - use early returns
```suggestion
def reload!
raise ArgumentError, 'No storage configured' unless storage
data = storage.read
raise StorageError, 'No saved state found' unless data
restore_from_json(data)
@dirty = false
self
end
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
lib/classifier/knn.rb
Outdated
| @k, @weighted, @lsi, @dirty = data | ||
| @storage = nil | ||
| end | ||
|
|
||
| private | ||
|
|
||
| # Finds the k nearest neighbors for the given text. | ||
| # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: nested conditionals - use early returns
| @k, @weighted, @lsi, @dirty = data | |
| @storage = nil | |
| end | |
| private | |
| # Finds the k nearest neighbors for the given text. | |
| # | |
| def self.load(storage:) | |
| data = storage.read | |
| raise StorageError, 'No saved state found' unless data | |
| instance = from_json(data) | |
| instance.storage = storage | |
| instance | |
| end |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 285:292
Comment:
**style:** nested conditionals - use early returns
```suggestion
def self.load(storage:)
data = storage.read
raise StorageError, 'No saved state found' unless data
instance = from_json(data)
instance.storage = storage
instance
end
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5 files reviewed, 9 comments
lib/classifier/knn.rb
Outdated
| # LSI requires at least 2 items to build an index | ||
| # For single item, return it directly with a default similarity | ||
| if @lsi.items.size == 1 | ||
| item = @lsi.items.first | ||
| return [{ | ||
| item: item, | ||
| category: @lsi.categories_for(item).first, | ||
| similarity: 1.0 | ||
| }] | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: nested conditional with defensive code for single-item case. LSI already handles this - remove the special case.
| # LSI requires at least 2 items to build an index | |
| # For single item, return it directly with a default similarity | |
| if @lsi.items.size == 1 | |
| item = @lsi.items.first | |
| return [{ | |
| item: item, | |
| category: @lsi.categories_for(item).first, | |
| similarity: 1.0 | |
| }] | |
| end | |
| def find_neighbors(text) | |
| proximity = @lsi.proximity_array_for_content(text) | |
| neighbors = proximity.reject { |item, _| item == text }.first(@k) | |
| neighbors.map do |item, similarity| | |
| { | |
| item: item, | |
| category: @lsi.categories_for(item).first, | |
| similarity: similarity | |
| } | |
| end | |
| end |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 295:304
Comment:
**style:** nested conditional with defensive code for single-item case. LSI already handles this - remove the special case.
```suggestion
def find_neighbors(text)
proximity = @lsi.proximity_array_for_content(text)
neighbors = proximity.reject { |item, _| item == text }.first(@k)
neighbors.map do |item, similarity|
{
item: item,
category: @lsi.categories_for(item).first,
similarity: similarity
}
end
end
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| # | ||
| # @rbs () -> void | ||
| def save | ||
| raise ArgumentError, 'No storage configured. Use save_to_file(path) or set storage=' unless storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: defensive error for impossible scenario - if storage is nil, the method already can't be called. Remove this check.
| raise ArgumentError, 'No storage configured. Use save_to_file(path) or set storage=' unless storage | |
| storage.write(to_json) |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 200:200
Comment:
**style:** defensive error for impossible scenario - if storage is nil, the method already can't be called. Remove this check.
```suggestion
storage.write(to_json)
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| raise ArgumentError, 'No storage configured' unless storage | ||
| raise UnsavedChangesError, 'Unsaved changes would be lost. Call save first or use reload!' if @dirty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: nested error checks - use early returns
| raise ArgumentError, 'No storage configured' unless storage | |
| raise UnsavedChangesError, 'Unsaved changes would be lost. Call save first or use reload!' if @dirty | |
| def reload | |
| raise ArgumentError, 'No storage configured' unless storage | |
| raise UnsavedChangesError, 'Unsaved changes would be lost. Call save first or use reload!' if @dirty | |
| data = storage.read | |
| raise StorageError, 'No saved state found' unless data | |
| restore_from_json(data) | |
| @dirty = false | |
| self | |
| end |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 222:223
Comment:
**style:** nested error checks - use early returns
```suggestion
def reload
raise ArgumentError, 'No storage configured' unless storage
raise UnsavedChangesError, 'Unsaved changes would be lost. Call save first or use reload!' if @dirty
data = storage.read
raise StorageError, 'No saved state found' unless data
restore_from_json(data)
@dirty = false
self
end
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| raise ArgumentError, 'No storage configured' unless storage | ||
|
|
||
| data = storage.read | ||
| raise StorageError, 'No saved state found' unless data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: nested error checks - use early returns
| raise ArgumentError, 'No storage configured' unless storage | |
| data = storage.read | |
| raise StorageError, 'No saved state found' unless data | |
| def reload! | |
| raise ArgumentError, 'No storage configured' unless storage | |
| data = storage.read | |
| raise StorageError, 'No saved state found' unless data | |
| restore_from_json(data) | |
| @dirty = false | |
| self | |
| end |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 237:240
Comment:
**style:** nested error checks - use early returns
```suggestion
def reload!
raise ArgumentError, 'No storage configured' unless storage
data = storage.read
raise StorageError, 'No saved state found' unless data
restore_from_json(data)
@dirty = false
self
end
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| data = storage.read | ||
| raise StorageError, 'No saved state found' unless data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: nested error check - use early return
| data = storage.read | |
| raise StorageError, 'No saved state found' unless data | |
| def self.load(storage:) | |
| data = storage.read | |
| raise StorageError, 'No saved state found' unless data | |
| instance = from_json(data) | |
| instance.storage = storage | |
| instance | |
| end |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 259:260
Comment:
**style:** nested error check - use early return
```suggestion
def self.load(storage:)
data = storage.read
raise StorageError, 'No saved state found' unless data
instance = from_json(data)
instance.storage = storage
instance
end
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| synchronize do | ||
| @k = data['k'] | ||
| @weighted = data['weighted'] | ||
|
|
||
| lsi_data = data['lsi'] | ||
| lsi_data['type'] = 'lsi' | ||
| @lsi = LSI.from_json(lsi_data) | ||
| @dirty = false | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: synchronize block wraps entire method body - move the guard clause outside
| synchronize do | |
| @k = data['k'] | |
| @weighted = data['weighted'] | |
| lsi_data = data['lsi'] | |
| lsi_data['type'] = 'lsi' | |
| @lsi = LSI.from_json(lsi_data) | |
| @dirty = false | |
| end | |
| def restore_from_json(json) | |
| data = JSON.parse(json) | |
| raise ArgumentError, "Invalid classifier type: #{data['type']}" unless data['type'] == 'knn' | |
| synchronize do | |
| @k = data['k'] | |
| @weighted = data['weighted'] | |
| lsi_data = data['lsi'] | |
| lsi_data['type'] = 'lsi' | |
| @lsi = LSI.from_json(lsi_data) | |
| @dirty = false | |
| end | |
| end |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 352:360
Comment:
**style:** synchronize block wraps entire method body - move the guard clause outside
```suggestion
def restore_from_json(json)
data = JSON.parse(json)
raise ArgumentError, "Invalid classifier type: #{data['type']}" unless data['type'] == 'knn'
synchronize do
@k = data['k']
@weighted = data['weighted']
lsi_data = data['lsi']
lsi_data['type'] = 'lsi'
@lsi = LSI.from_json(lsi_data)
@dirty = false
end
end
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
lib/classifier/knn.rb
Outdated
| # Restore the LSI from its nested data | ||
| lsi_data = data['lsi'] | ||
| lsi_data['type'] = 'lsi' # Ensure type is set for LSI.from_json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: unnecessary mutation of passed data - work with local copy
| # Restore the LSI from its nested data | |
| lsi_data = data['lsi'] | |
| lsi_data['type'] = 'lsi' # Ensure type is set for LSI.from_json | |
| lsi_data = data['lsi'].dup | |
| lsi_data['type'] = 'lsi' |
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 186:188
Comment:
**logic:** unnecessary mutation of passed data - work with local copy
```suggestion
lsi_data = data['lsi'].dup
lsi_data['type'] = 'lsi'
```
How can I resolve this? If you propose a fix, please make it concise.
lib/classifier/knn.rb
Outdated
| lsi_data = data['lsi'] | ||
| lsi_data['type'] = 'lsi' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: unnecessary mutation of parsed data - work with local copy
| lsi_data = data['lsi'] | |
| lsi_data['type'] = 'lsi' | |
| lsi_data = data['lsi'].dup | |
| lsi_data['type'] = 'lsi' |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 356:357
Comment:
**logic:** unnecessary mutation of parsed data - work with local copy
```suggestion
lsi_data = data['lsi'].dup
lsi_data['type'] = 'lsi'
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
lib/classifier/knn.rb
Outdated
| neighbors.each do |neighbor| | ||
| category = neighbor[:category] | ||
| next unless category |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: guard clause inside loop - extract to method or combine with map logic
| neighbors.each do |neighbor| | |
| category = neighbor[:category] | |
| next unless category | |
| neighbors.each do |neighbor| | |
| category = neighbor[:category] | |
| weight = @weighted ? [neighbor[:similarity], 0.0].max : 1.0 | |
| votes[category] += weight if category | |
| end |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 324:326
Comment:
**style:** guard clause inside loop - extract to method or combine with map logic
```suggestion
neighbors.each do |neighbor|
category = neighbor[:category]
weight = @weighted ? [neighbor[:similarity], 0.0].max : 1.0
votes[category] += weight if category
end
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.|
@greptile please re-review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5 files reviewed, 23 comments
lib/classifier/knn.rb
Outdated
| lsi_data = data['lsi'] | ||
| lsi_data['type'] = 'lsi' # Ensure type is set for LSI.from_json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: mutates the passed-in hash by adding 'type' key, violating principle of least surprise
| lsi_data = data['lsi'] | |
| lsi_data['type'] = 'lsi' # Ensure type is set for LSI.from_json | |
| lsi_data = data['lsi'].dup | |
| lsi_data['type'] = 'lsi' |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 137:138
Comment:
**logic:** mutates the passed-in hash by adding `'type'` key, violating principle of least surprise
```suggestion
lsi_data = data['lsi'].dup
lsi_data['type'] = 'lsi'
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
lib/classifier/knn.rb
Outdated
| lsi_data = data['lsi'] | ||
| lsi_data['type'] = 'lsi' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: mutates the parsed hash, same issue as line 137
| lsi_data = data['lsi'] | |
| lsi_data['type'] = 'lsi' | |
| lsi_data = data['lsi'].dup | |
| lsi_data['type'] = 'lsi' |
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 285:286
Comment:
**logic:** mutates the parsed hash, same issue as line 137
```suggestion
lsi_data = data['lsi'].dup
lsi_data['type'] = 'lsi'
```
How can I resolve this? If you propose a fix, please make it concise.
lib/classifier/knn.rb
Outdated
| # LSI requires at least 2 items to build an index | ||
| # For single item, return it directly with a default similarity | ||
| if @lsi.items.size == 1 | ||
| item = @lsi.items.first | ||
| return [{ | ||
| item: item, | ||
| category: @lsi.categories_for(item).first, | ||
| similarity: 1.0 | ||
| }] | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: unnecessary special case - LSI's proximity_array_for_content should handle single-item case internally
If LSI requires this workaround, that's an LSI bug to fix there, not something to paper over here
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 228:237
Comment:
**style:** unnecessary special case - LSI's `proximity_array_for_content` should handle single-item case internally
If LSI requires this workaround, that's an LSI bug to fix there, not something to paper over here
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
lib/classifier/knn.rb
Outdated
| category = neighbor[:category] | ||
| next unless category | ||
|
|
||
| weight = @weighted ? [neighbor[:similarity], 0.0].max : 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: [neighbor[:similarity], 0.0].max is defensive - similarity from LSI should never be negative. If it is, that's a bug in LSI
| weight = @weighted ? [neighbor[:similarity], 0.0].max : 1.0 | |
| weight = @weighted ? neighbor[:similarity] : 1.0 |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 259:259
Comment:
**style:** `[neighbor[:similarity], 0.0].max` is defensive - similarity from LSI should never be negative. If it is, that's a bug in LSI
```suggestion
weight = @weighted ? neighbor[:similarity] : 1.0
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
lib/classifier/knn.rb
Outdated
| # Restore the LSI from its nested data | ||
| lsi_data = data['lsi'] | ||
| lsi_data['type'] = 'lsi' # Ensure type is set for LSI.from_json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: comment repeats what code shows - the code is self-documenting
| # Restore the LSI from its nested data | |
| lsi_data = data['lsi'] | |
| lsi_data['type'] = 'lsi' # Ensure type is set for LSI.from_json | |
| lsi_data = data['lsi'].dup | |
| lsi_data['type'] = 'lsi' |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 136:138
Comment:
**style:** comment repeats what code shows - the code is self-documenting
```suggestion
lsi_data = data['lsi'].dup
lsi_data['type'] = 'lsi'
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| def reload | ||
| raise ArgumentError, 'No storage configured' unless storage | ||
| raise UnsavedChangesError, 'Unsaved changes would be lost. Call save first or use reload!' if @dirty | ||
|
|
||
| data = storage.read | ||
| raise StorageError, 'No saved state found' unless data | ||
|
|
||
| restore_from_json(data) | ||
| @dirty = false | ||
| self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: nested error checks - use early returns
| def reload | |
| raise ArgumentError, 'No storage configured' unless storage | |
| raise UnsavedChangesError, 'Unsaved changes would be lost. Call save first or use reload!' if @dirty | |
| data = storage.read | |
| raise StorageError, 'No saved state found' unless data | |
| restore_from_json(data) | |
| @dirty = false | |
| self | |
| def reload | |
| raise ArgumentError, 'No storage configured' unless storage | |
| raise UnsavedChangesError, 'Unsaved changes would be lost. Call save first or use reload!' if @dirty | |
| data = storage.read | |
| raise StorageError, 'No saved state found' unless data | |
| restore_from_json(data) | |
| @dirty = false | |
| self | |
| end |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 165:174
Comment:
**style:** nested error checks - use early returns
```suggestion
def reload
raise ArgumentError, 'No storage configured' unless storage
raise UnsavedChangesError, 'Unsaved changes would be lost. Call save first or use reload!' if @dirty
data = storage.read
raise StorageError, 'No saved state found' unless data
restore_from_json(data)
@dirty = false
self
end
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| def reload! | ||
| raise ArgumentError, 'No storage configured' unless storage | ||
|
|
||
| data = storage.read | ||
| raise StorageError, 'No saved state found' unless data | ||
|
|
||
| restore_from_json(data) | ||
| @dirty = false | ||
| self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: nested error checks - use early returns
| def reload! | |
| raise ArgumentError, 'No storage configured' unless storage | |
| data = storage.read | |
| raise StorageError, 'No saved state found' unless data | |
| restore_from_json(data) | |
| @dirty = false | |
| self | |
| def reload! | |
| raise ArgumentError, 'No storage configured' unless storage | |
| data = storage.read | |
| raise StorageError, 'No saved state found' unless data | |
| restore_from_json(data) | |
| @dirty = false | |
| self | |
| end |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 179:187
Comment:
**style:** nested error checks - use early returns
```suggestion
def reload!
raise ArgumentError, 'No storage configured' unless storage
data = storage.read
raise StorageError, 'No saved state found' unless data
restore_from_json(data)
@dirty = false
self
end
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| def self.load(storage:) | ||
| data = storage.read | ||
| raise StorageError, 'No saved state found' unless data | ||
|
|
||
| instance = from_json(data) | ||
| instance.storage = storage | ||
| instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: nested error check - use early return
| def self.load(storage:) | |
| data = storage.read | |
| raise StorageError, 'No saved state found' unless data | |
| instance = from_json(data) | |
| instance.storage = storage | |
| instance | |
| def self.load(storage:) | |
| data = storage.read | |
| raise StorageError, 'No saved state found' unless data | |
| instance = from_json(data) | |
| instance.storage = storage | |
| instance | |
| end |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 197:203
Comment:
**style:** nested error check - use early return
```suggestion
def self.load(storage:)
data = storage.read
raise StorageError, 'No saved state found' unless data
instance = from_json(data)
instance.storage = storage
instance
end
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| neighbors.each do |neighbor| | ||
| category = neighbor[:category] | ||
| next unless category | ||
|
|
||
| weight = @weighted ? [neighbor[:similarity], 0.0].max : 1.0 | ||
| votes[category] += weight | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: guard clause inside loop - combine with map logic or extract to method
| neighbors.each do |neighbor| | |
| category = neighbor[:category] | |
| next unless category | |
| weight = @weighted ? [neighbor[:similarity], 0.0].max : 1.0 | |
| votes[category] += weight | |
| end | |
| neighbors.each do |neighbor| | |
| category = neighbor[:category] or next | |
| weight = @weighted ? neighbor[:similarity] : 1.0 | |
| votes[category] += weight | |
| end |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 255:261
Comment:
**style:** guard clause inside loop - combine with map logic or extract to method
```suggestion
neighbors.each do |neighbor|
category = neighbor[:category] or next
weight = @weighted ? neighbor[:similarity] : 1.0
votes[category] += weight
end
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| def restore_from_json(json) | ||
| data = JSON.parse(json) | ||
| raise ArgumentError, "Invalid classifier type: #{data['type']}" unless data['type'] == 'knn' | ||
|
|
||
| synchronize do | ||
| @k = data['k'] | ||
| @weighted = data['weighted'] | ||
|
|
||
| lsi_data = data['lsi'] | ||
| lsi_data['type'] = 'lsi' | ||
| @lsi = LSI.from_json(lsi_data) | ||
| @dirty = false | ||
| end | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: synchronize block wraps entire method - move guard clause outside
| def restore_from_json(json) | |
| data = JSON.parse(json) | |
| raise ArgumentError, "Invalid classifier type: #{data['type']}" unless data['type'] == 'knn' | |
| synchronize do | |
| @k = data['k'] | |
| @weighted = data['weighted'] | |
| lsi_data = data['lsi'] | |
| lsi_data['type'] = 'lsi' | |
| @lsi = LSI.from_json(lsi_data) | |
| @dirty = false | |
| end | |
| end | |
| def restore_from_json(json) | |
| data = JSON.parse(json) | |
| raise ArgumentError, "Invalid classifier type: #{data['type']}" unless data['type'] == 'knn' | |
| synchronize do | |
| @k = data['k'] | |
| @weighted = data['weighted'] | |
| lsi_data = data['lsi'].dup | |
| lsi_data['type'] = 'lsi' | |
| @lsi = LSI.from_json(lsi_data) | |
| @dirty = false | |
| end | |
| end |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/knn.rb
Line: 277:290
Comment:
**style:** `synchronize` block wraps entire method - move guard clause outside
```suggestion
def restore_from_json(json)
data = JSON.parse(json)
raise ArgumentError, "Invalid classifier type: #{data['type']}" unless data['type'] == 'knn'
synchronize do
@k = data['k']
@weighted = data['weighted']
lsi_data = data['lsi'].dup
lsi_data['type'] = 'lsi'
@lsi = LSI.from_json(lsi_data)
@dirty = false
end
end
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.- Fix LSI single-item bug instead of working around in KNN - Remove defensive max(0) check on similarity scores - Use early return for nil winner - Simplify guard clause with 'or next' - Condense class docstring - Avoid mutating input hash with .dup
|
Addressed the actionable feedback - thanks! Re: remaining comments - these are intentionally kept:
The removed ones were YARD tags ( @greptile please re-review |
Summary
add,classify)classify_with_neighbors()returns interpretable results with neighbor detailsweighted: true)Usage
When to Use kNN vs Bayes vs LSI
Test plan
Closes #103