Fix strongs tokenizing filter by JJK96 · Pull Request #20 · AndBible/jsword

JJK96 · 2025-11-13T20:01:17Z

Previous version gave a null pointer exception on termAtt. After that, it gave issues due to incorrect position increments. I decided to extend FilteringTokenFilter instead of TokenFilter to defer positioning and only handle accepting of tokens, which simplifies the code significantly.

Copilot

Pull request overview

This pull request refactors the StrongsNumberFilter class to fix null pointer exceptions and position increment issues by switching from extending TokenFilter to extending FilteringTokenFilter. The refactoring simplifies the implementation by delegating position management to the parent class and focusing solely on token acceptance logic.

Changes:

Replaced parent class from TokenFilter to FilteringTokenFilter
Replaced complex incrementToken() method with simpler accept() method
Removed custom equals() and hashCode() methods
Initialized termAtt attribute in constructor

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-17T18:05:37Z

src/main/java/org/crosswire/jsword/index/lucene/analysis/StrongsNumberFilter.java

 import org.apache.lucene.analysis.TokenFilter;
 import org.apache.lucene.analysis.TokenStream;
 import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;


The PositionIncrementAttribute is imported but never used in the code. This import should be removed as it serves no purpose in the refactored implementation.

Suggested change

import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;

Copilot · 2026-01-17T18:05:37Z

src/main/java/org/crosswire/jsword/index/lucene/analysis/StrongsNumberFilter.java

 import java.io.IOException;

+import org.apache.lucene.analysis.FilteringTokenFilter;
 import org.apache.lucene.analysis.TokenFilter;


The TokenFilter import is unused since the class now extends FilteringTokenFilter instead of TokenFilter. This import should be removed.

Suggested change

import org.apache.lucene.analysis.TokenFilter;

Copilot · 2026-01-17T18:05:38Z

src/main/java/org/crosswire/jsword/index/lucene/analysis/StrongsNumberFilter.java

-
-            // There was no more input
-            return false;
+        } else {


There is a critical bug in the accept() method logic. When processing a Strong's Number with a part (line 63-66), the code calls number.isValid() on the existing StrongsNumber object that was created from a previous token. However, at line 65, it attempts to get the full Strong's number from the same object that was parsed from the PREVIOUS token, not the current token being processed. This will cause the filter to output incorrect token text because it's not using the current token's text at all in the else branch. The current token's text (stored in tokenText) is captured at line 55 but never used in the else branch.

Suggested change

} else {

} else {

// Process the current token's text rather than reusing the previous StrongsNumber

number = new StrongsNumber(tokenText);

Copilot · 2026-01-17T18:05:38Z

src/main/java/org/crosswire/jsword/index/lucene/analysis/StrongsNumberFilter.java

+    public boolean accept() {
+        boolean valid;
+        String tokenText = termAtt.toString();
        if (number == null) {
-            // Need to loop over invalid tokens
-            while (input.incrementToken()) {
-                String tokenText = termAtt.toString();
-
-                number = new StrongsNumber(tokenText);
-
-                // Skip invalid Strong's Numbers.
-                // Still need to return true as there may be more tokens to filter.
-                if (!number.isValid()) {
-                    // TRANSLATOR: User error condition: Indicates that what was given is not a Strong's Number. {0} is a placeholder for the bad Strong's Number.
-                    log.warn(JSMsg.gettext("Not a valid Strong's Number \"{0}\"", tokenText));
-
-                    // Go get the next token
-                    continue;
-                }
-
-                String s = number.getStrongsNumber();
-                termAtt.setEmpty().append(s);
-
-                // If the number had a part keep it around for the next call
-                // TODO(DMS): if there is a part, then treat as a synonym,
-                //      setting the same position increment.
-                if (!number.isPart()) {
-                    number = null;
-                }
-
-                // incrementToken returned a value. There may be more input.
-                return true;
+            number = new StrongsNumber(tokenText);
+            valid = number.isValid();
+            termAtt.setEmpty().append(number.getStrongsNumber());
+            if (!number.isPart()) {
+                number = null;
            }
-
-            // There was no more input
-            return false;
+        } else {
+            valid = number.isValid();
+            termAtt.setEmpty().append(number.getFullStrongsNumber());
+            number = null;
        }
-
-        // Process the Strong's number with the !a
-        termAtt.setEmpty().append(number.getFullStrongsNumber());
-        // We are done with the Strong's Number so mark it as used
-        number = null;
-        // We are working on a value returned by incrementToken.
-        // There may be more input.
-        return true;
-    }
-
-    /* Define to quite FindBugs */
-    @Override
-    public boolean equals(Object obj) {
-        return super.equals(obj);
-    }
-
-    /* Define to quite FindBugs */
-    @Override
-    public int hashCode() {
-        return super.hashCode();
+        if (!valid) {
+            log.warn(JSMsg.gettext("Not a valid Strong's Number \"{0}\"", tokenText));
+        }
+        return valid;


The accept() method modifies the termAtt (the token text) before returning whether to accept or reject the token. This violates the contract of FilteringTokenFilter's accept() method, which should only determine whether to accept a token, not modify it. Token modifications should be done in a separate incrementToken() override that calls super.incrementToken() and then modifies the accepted token. Modifying the token text in accept() means that even rejected tokens will have their text modified, which could cause issues with the token stream state.

tuomas2 requested a review from Copilot January 17, 2026 18:02

Copilot started reviewing on behalf of tuomas2 January 17, 2026 18:02 View session

Copilot AI reviewed Jan 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix strongs tokenizing filter#20

Fix strongs tokenizing filter#20
JJK96 wants to merge 1 commit intodevelopfrom
fix_strongsnumberfilter

JJK96 commented Nov 13, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 17, 2026

Uh oh!

Copilot AI Jan 17, 2026

Uh oh!

Copilot AI Jan 17, 2026

Uh oh!

Copilot AI Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JJK96 commented Nov 13, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants