-
Notifications
You must be signed in to change notification settings - Fork 4
Add prompt optimizers to LiSSA #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This commit introduces a prompt optimization system that enables automatic tuning of classification prompts to improve trace link recovery performance. Core Components: - PromptOptimizer interface and IterativeOptimizer implementation for iterative naive prompt refinement - Optimization class as the main orchestrator for running optimization workflows - OptimizeCommand CLI integration for executing optimizations from command line Evaluation Framework: - Evaluator base class with strategy pattern for different evaluation approaches - BruteForceEvaluator for exhaustive prompt candidate evaluation - UpperConfidenceBoundBanditEvaluator for efficient exploration-exploitation tradeoff using UCB1 algorithm to reduce computational costs - MockEvaluator for testing purposes Metrics System: - Metric interface with PointwiseMetric and GlobalMetric implementations - FBetaMetric for computing precision, recall, and F-beta scores - Scorer interface with BinaryScorer for trace link classification scoring - Reductor interface with MeanReductor for aggregating pointwise metrics - Factory classes (MetricFactory, ScorerFactory, ReductorFactory) for instantiation Infrastructure Updates: - Extended ModuleConfiguration with prompt management capabilities - Added OptimizerConfiguration for optimizer-specific settings - Enhanced CacheManager with scorer-specific caching via ScorerCacheKey - Added ChatLanguageModelUtils for LLM interaction helpers - Updated Statistics class with methods for computing evaluation metrics Classifier Modifications: - Used setClassificationPrompt() and getClassificationPromptKey() to Classifier abstract class to support dynamic prompt updates - Replaced cache parameters with map Testing: - Added PromptOptimizationUtilsTest with Markdown escaping tests - Extended Requirement2RequirementE2ETest with optimizer test cases - Added test resources with expected outputs and cache files for iterative and simple optimizer configurations - Updated ArchitectureTest to ensure correct usage of dangerous overwrite methods Configuration: - Added example-configs/optimizer-config.json demonstrating optimizer setup
|
@DanielDango I've opened a new pull request, #45, to work on those changes. Once the pull request is ready, I'll request review from you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a comprehensive prompt optimization framework to the LiSSA trace link analysis system. The framework enables iterative improvement of classification prompts through automated optimization strategies, integrating metrics, evaluators, and optimizer implementations.
Key Changes:
- Adds prompt optimizer framework with iterative optimization support
- Introduces metric system (pointwise and global metrics) for evaluating prompt performance
- Implements evaluator strategies (brute-force and Upper Confidence Bound bandit algorithms)
- Extends configuration system to support optimization workflows
Reviewed changes
Copilot reviewed 52 out of 54 changed files in this pull request and generated 28 comments.
Show a summary per file
| File | Description |
|---|---|
| src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/* | Core optimizer interfaces and implementations (IterativeOptimizer, MockOptimizer, OptimizerFactory, utilities) |
| src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptmetric/* | Metric system with scorers (BinaryScorer), reductors (MeanReductor), and metric implementations (PointwiseMetric, GlobalMetric, FBetaMetric) |
| src/main/java/edu/kit/kastel/sdq/lissa/ratlr/evaluator/* | Evaluator base class with BruteForceEvaluator and UpperConfidenceBoundBanditEvaluator implementations |
| src/main/java/edu/kit/kastel/sdq/lissa/ratlr/Optimization.java | New optimization pipeline that extends evaluation with prompt optimization step |
| src/main/java/edu/kit/kastel/sdq/lissa/ratlr/Evaluation.java | Extended to support prompt overwriting for optimization workflow |
| src/main/java/edu/kit/kastel/sdq/lissa/ratlr/configuration/* | OptimizerConfiguration record and ModuleConfiguration setArgument methods |
| src/main/java/edu/kit/kastel/sdq/lissa/ratlr/classifier/* | Added getCacheParameters() method and changed copyOf() visibility to public |
| src/main/java/edu/kit/kastel/sdq/lissa/ratlr/cache/* | New ScorerCacheKey and CacheManager enhancements for optimizer caching |
| src/main/java/edu/kit/kastel/sdq/lissa/cli/command/OptimizeCommand.java | CLI command for running optimization and optional evaluation pipelines |
| src/test/java/edu/kit/kastel/sdq/lissa/ratlr/* | Test files including PromptOptimizationUtilsTest and E2E optimizer tests |
| src/test/resources/warc/* | Test configurations and expected outputs for optimizer validation |
| example-configs/optimizer-config.json | Example optimizer configuration file |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
| List<Double> scores; | ||
| while (true) { | ||
| try { | ||
| // TODO: Pryzant et al. used multiple threads here |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TODO comment "Pryzant et al. used multiple threads here" should be addressed or removed. If multi-threading is planned, create a tracking issue; otherwise, remove the comment if it's not relevant.
| // TODO: Pryzant et al. used multiple threads here |
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/evaluator/UpperConfidenceBoundBanditEvaluator.java
Outdated
Show resolved
Hide resolved
| * Custom optimization prompts can also use the placeholders {@value SOURCE_TYPE_PLACEHOLDER}, | ||
| * {@value TARGET_TYPE_PLACEHOLDER} and should use {@value ORIGINAL_PROMPT_PLACEHOLDER}. | ||
| * The optimized prompt should be enclosed between {@value PROMPT_START} and {@value PROMPT_END}. | ||
| * TODO: There should probably be spaces around the prompt tags :upsidedown:. |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TODO comment contains an emoji ":upsidedown:" which should be removed or replaced with a proper description. Consider clarifying whether spaces are truly needed or remove the comment if this is acceptable.
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/EvaluationResult.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/EvaluationResult.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/evaluator/UpperConfidenceBoundBanditEvaluator.java
Outdated
Show resolved
Hide resolved
| return getClassificationKey(task.source().toString(), task.target().toString()); | ||
| } | ||
|
|
||
| @NotNull | ||
| private static String getClassificationKey(ClassificationResult result) { | ||
| return getClassificationKey(result.source().toString(), result.target().toString()); |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default toString(): Element inherits toString() from Object, and so is not suitable for printing.
| return getClassificationKey(task.source().toString(), task.target().toString()); | |
| } | |
| @NotNull | |
| private static String getClassificationKey(ClassificationResult result) { | |
| return getClassificationKey(result.source().toString(), result.target().toString()); | |
| // Replace toString() with a stable identifier for source and target, e.g., getId() | |
| return getClassificationKey(task.source().getId(), task.target().getId()); | |
| } | |
| @NotNull | |
| private static String getClassificationKey(ClassificationResult result) { | |
| // Replace toString() with a stable identifier for source and target, e.g., getId() | |
| return getClassificationKey(result.source().getId(), result.target().getId()); |
| return getClassificationKey(task.source().toString(), task.target().toString()); | ||
| } | ||
|
|
||
| @NotNull | ||
| private static String getClassificationKey(ClassificationResult result) { | ||
| return getClassificationKey(result.source().toString(), result.target().toString()); | ||
| } | ||
|
|
||
| @NotNull |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default toString(): Element inherits toString() from Object, and so is not suitable for printing.
| return getClassificationKey(task.source().toString(), task.target().toString()); | |
| } | |
| @NotNull | |
| private static String getClassificationKey(ClassificationResult result) { | |
| return getClassificationKey(result.source().toString(), result.target().toString()); | |
| } | |
| @NotNull | |
| return getClassificationKey(objectKey(task.source()), objectKey(task.target())); | |
| } | |
| @NotNull | |
| private static String getClassificationKey(ClassificationResult result) { | |
| return getClassificationKey(objectKey(result.source()), objectKey(result.target())); | |
| } | |
| /** | |
| * Returns a stable string key for the given object, avoiding default Object.toString(). | |
| * Modify this method to extract a unique and meaningful identifier from the object. | |
| */ | |
| private static String objectKey(Object obj) { | |
| // If obj is null, return "null" | |
| if (obj == null) { | |
| return "null"; | |
| } | |
| // If obj has a getId() method, use it. Otherwise, fallback to toString(). | |
| // TODO: Replace this with a more robust implementation if possible. | |
| try { | |
| java.lang.reflect.Method getId = obj.getClass().getMethod("getId"); | |
| Object id = getId.invoke(obj); | |
| return id != null ? id.toString() : "null"; | |
| } catch (Exception e) { | |
| // Fallback: use toString(), but warn in a comment. | |
| // WARNING: This may use Object.toString() if no getId() is present. | |
| return obj.toString(); | |
| } | |
| } |
| return getClassificationKey(task.source().toString(), task.target().toString()); | ||
| } | ||
|
|
||
| @NotNull | ||
| private static String getClassificationKey(ClassificationResult result) { | ||
| return getClassificationKey(result.source().toString(), result.target().toString()); |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default toString(): Element inherits toString() from Object, and so is not suitable for printing.
| return getClassificationKey(task.source().toString(), task.target().toString()); | |
| } | |
| @NotNull | |
| private static String getClassificationKey(ClassificationResult result) { | |
| return getClassificationKey(result.source().toString(), result.target().toString()); | |
| return getClassificationKey(task.source(), task.target()); | |
| } | |
| @NotNull | |
| private static String getClassificationKey(ClassificationResult result) { | |
| return getClassificationKey(result.source(), result.target()); |
| return getClassificationKey(task.source().toString(), task.target().toString()); | ||
| } | ||
|
|
||
| @NotNull | ||
| private static String getClassificationKey(ClassificationResult result) { | ||
| return getClassificationKey(result.source().toString(), result.target().toString()); |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default toString(): Element inherits toString() from Object, and so is not suitable for printing.
| return getClassificationKey(task.source().toString(), task.target().toString()); | |
| } | |
| @NotNull | |
| private static String getClassificationKey(ClassificationResult result) { | |
| return getClassificationKey(result.source().toString(), result.target().toString()); | |
| // Ensure that source() and target() return meaningful string representations. | |
| Object source = task.source(); | |
| Object target = task.target(); | |
| String sourceStr = (source instanceof String) ? (String) source : String.valueOf(source); | |
| String targetStr = (target instanceof String) ? (String) target : String.valueOf(target); | |
| return getClassificationKey(sourceStr, targetStr); | |
| } | |
| @NotNull | |
| private static String getClassificationKey(ClassificationResult result) { | |
| // Ensure that source() and target() return meaningful string representations. | |
| Object source = result.source(); | |
| Object target = result.target(); | |
| String sourceStr = (source instanceof String) ? (String) source : String.valueOf(source); | |
| String targetStr = (target instanceof String) ? (String) target : String.valueOf(target); | |
| return getClassificationKey(sourceStr, targetStr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 52 out of 54 changed files in this pull request and generated 10 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
| List<Double> scores; | ||
| while (true) { | ||
| try { | ||
| // TODO: Pryzant et al. used multiple threads here |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TODO comment about Pryzant et al. using multiple threads should be addressed. If multi-threading is intended to be added, this should be tracked properly. Otherwise, the comment should be removed or clarified.
| // TODO: Pryzant et al. used multiple threads here | |
| // Note: Pryzant et al. used multiple threads here. This implementation is currently single-threaded. |
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/classifier/ChatLanguageModelProvider.java
Outdated
Show resolved
Hide resolved
| }); | ||
|
|
||
| /** | ||
| * Prompts for classifiers should only be modified by optimizers or smetricscorers. Otherwise, there will be |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo or grammatical issue: "smetricscorers" should likely be "metrics scorers" or "metric scorers" with proper spacing.
| * TODO: consider extracting into a aggregator implementation ??? | ||
| * |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TODO comment about extracting this into an aggregator implementation should be addressed or clarified. If this is a design decision that needs to be made, consider opening an issue to track it rather than leaving it as a TODO in the code.
| * TODO: consider extracting into a aggregator implementation ??? | |
| * | |
| * | |
| * |
| */ | ||
| public Evaluation(Configuration config) throws IOException { | ||
| this.configuration = config; | ||
| // TODO maybe dont? |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TODO comment "maybe dont?" on line 151 should be resolved. It's unclear what the concern is about setting configFile to null. This should either be addressed or the comment should be clarified.
| // TODO maybe dont? | |
| // No config file path available when constructed from Configuration object |
| return Arrays.stream(ucbScores) | ||
| .boxed() | ||
| // sort in descending order | ||
| .sorted((a, b) -> Double.compare(b, a)) | ||
| .limit(n) | ||
| .map(score -> { | ||
| for (int i = 0; i < ucbScores.length; i++) { | ||
| if (ucbScores[i] == score) { | ||
| return i; | ||
| } | ||
| } | ||
| // should never happen | ||
| return -1; | ||
| }) | ||
| .toList(); |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The choose method has a bug in finding the indices of top-scoring prompts. After sorting scores in descending order (line 182), the code attempts to find the original index by comparing values (lines 185-188). However, if multiple prompts have the same score, this will always return the first matching index, potentially selecting the same prompt multiple times. Consider using a proper sorting with index tracking, such as creating a list of index-score pairs before sorting.
| return Arrays.stream(ucbScores) | |
| .boxed() | |
| // sort in descending order | |
| .sorted((a, b) -> Double.compare(b, a)) | |
| .limit(n) | |
| .map(score -> { | |
| for (int i = 0; i < ucbScores.length; i++) { | |
| if (ucbScores[i] == score) { | |
| return i; | |
| } | |
| } | |
| // should never happen | |
| return -1; | |
| }) | |
| .toList(); | |
| // Pair each score with its index | |
| List<int[]> indexScorePairs = new ArrayList<>(); | |
| for (int i = 0; i < ucbScores.length; i++) { | |
| indexScorePairs.add(new int[] {i, Double.isNaN(ucbScores[i]) ? Integer.MIN_VALUE : (int) Math.round(ucbScores[i])}); | |
| } | |
| // Sort pairs by score descending | |
| indexScorePairs.sort((a, b) -> Double.compare(ucbScores[b[0]], ucbScores[a[0]])); | |
| // Select the indices of the top n pairs | |
| List<Integer> chosen = new ArrayList<>(); | |
| for (int i = 0; i < Math.min(n, indexScorePairs.size()); i++) { | |
| chosen.add(indexScorePairs.get(i)[0]); | |
| } | |
| return chosen; |
| classifier.setClassificationPrompt(prompt); | ||
| List<ClassificationResult> acceptedTraceLinks = new ArrayList<>(); | ||
| List<ClassificationResult> rejectedTraceLinks = new ArrayList<>(); | ||
| new Pair<>(new ArrayList<>(), new ArrayList<>()); |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 107 creates a new Pair object that is immediately discarded and never used. This appears to be dead code that should be removed.
| new Pair<>(new ArrayList<>(), new ArrayList<>()); |
| expectedPrompt.lines().map(String::strip).toList(), | ||
| escapedOptimizedPrompt.lines().map(String::strip).toList(), | ||
| "The optimized prompt does not match every line (regardless of line terminators or leading/trailing " | ||
| + "spaces) of of the expected prompt."); |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: "of of the expected prompt" should be "of the expected prompt" - remove the duplicate "of".
|
|
||
| @NotNull | ||
| private static String getClassificationKey(ClassificationTask task) { | ||
| return getClassificationKey(task.source().toString(), task.target().toString()); |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default toString(): Element inherits toString() from Object, and so is not suitable for printing.
|
|
||
| @NotNull | ||
| private static String getClassificationKey(ClassificationResult result) { | ||
| return getClassificationKey(result.source().toString(), result.target().toString()); |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default toString(): Element inherits toString() from Object, and so is not suitable for printing.
…ented by @copilot on #44
- refines iterative optimizer with feedback on misclassified trace links (false positive or false negative) - add sample strategy to configure which TLs will be used as feedback
- add SLF4J Simple Logger configuration for improved logging in tests
…g and Simple Classifiers
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
… feature/add-prompt-optimization-module # Conflicts: # src/main/java/edu/kit/kastel/sdq/lissa/ratlr/classifier/ChatLanguageModelProvider.java
…timization with "Gradient Descent" and Beam Search" of Pryzant et al. 2023
- this changed expected outputs
|
|
@DanielDango I've opened a new pull request, #49, to work on those changes. Once the pull request is ready, I'll request review from you. |

This PR aims to introduce prompt optimizers developed during my bachelors thesis into the LiSSA framework.
They can be used as a standalone interaction with the framework, as well as chained with regular evaluation pipeline runs.
To-Do: