Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 10 additions & 21 deletions cmd/calculate/similar.go
Original file line number Diff line number Diff line change
Expand Up @@ -457,12 +457,12 @@ func processManga(idx int, data *SimilarityData, config processingConfig, progre
dDesc = 0
}

if dDesc < IgnoreDescScoreUnder || data.CorpusDescLength[i] < MinDescriptionWords {
dDesc = 0
}
if len(data.MangaList[i].Tags) < IgnoreTagsUnderCount || dDesc > AcceptDescScoreOver {
dTag = 1
}
if dDesc < IgnoreDescScoreUnder || data.CorpusDescLength[i] < MinDescriptionWords {
dDesc = 0
}
if len(data.MangaList[i].Tags) < IgnoreTagsUnderCount || dDesc > AcceptDescScoreOver {
dTag = 1
}

score := TagScoreRatio*dTag + dDesc
if score <= 0 {
Expand Down Expand Up @@ -520,21 +520,10 @@ func invalidForProcessing(match customMatch, currentIdx int, current, target int
return true, "Same UUID"
}

common := false
for _, l1 := range current.AvailableTranslatedLanguages {
for _, l2 := range target.AvailableTranslatedLanguages {
if l1 == l2 {
common = true
break
}
}
if common {
break
}
}
if !common && len(current.AvailableTranslatedLanguages) > 0 {
return true, "No Common Languages"
}
// Performance Optimization:
// We no longer perform the O(N^2) language check here.
// It has been replaced by a bitmask check in the caller (processManga)
// which is significantly faster and handles the "No Common Languages" logic.
Comment on lines +523 to +526
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment is helpful, but could be more precise about the trade-offs. The bitmask implementation is not a direct replacement for the "No Common Languages" logic due to the handling of overflow for rare languages (i.e., beyond the 63rd unique one).

The bitmask maps all these rare languages to a single overflow bit. This means two manga with different rare languages will be considered a potential match, which differs from the original, stricter logic that would have filtered them out.

This is likely an acceptable performance trade-off, but it would be valuable to mention this nuance here or in calculateLanguageMasks to prevent future confusion about the matching logic.

References
  1. When working with fixed-size integer types (like a 64-bit mask), ensure that overflow scenarios—such as mapping more than 63 languages—are explicitly handled or documented to prevent logic inconsistencies.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment is helpful, but could be more precise about the trade-offs. The bitmask implementation is not a direct replacement for the "No Common Languages" logic due to the handling of overflow for rare languages (i.e., beyond the 63rd unique one).

The bitmask maps all these rare languages to a single overflow bit. This means two manga with different rare languages will be considered a potential match, which differs from the original, stricter logic that would have filtered them out.

This is likely an acceptable performance trade-off, but it would be valuable to mention this nuance here or in calculateLanguageMasks to prevent future confusion about the matching logic.

References

@jules


if similar.NotValidMatch(current, target) {
return true, "Tag Check"
Expand Down