Skip to content

Conversation

@DanielDango
Copy link
Collaborator

@DanielDango DanielDango commented Nov 28, 2025

This PR aims to introduce prompt optimizers developed during my bachelors thesis into the LiSSA framework.
They can be used as a standalone interaction with the framework, as well as chained with regular evaluation pipeline runs.

To-Do:

  • Re-enable Feedback and Gradient optimizers
  • add usage to repository documentation

This commit introduces a prompt optimization system that enables
automatic tuning of classification prompts to improve trace link recovery performance.

Core Components:
- PromptOptimizer interface and IterativeOptimizer implementation for iterative naive
  prompt refinement
- Optimization class as the main orchestrator for running optimization workflows
- OptimizeCommand CLI integration for executing optimizations from command line

Evaluation Framework:
- Evaluator base class with strategy pattern for different evaluation approaches
- BruteForceEvaluator for exhaustive prompt candidate evaluation
- UpperConfidenceBoundBanditEvaluator for efficient exploration-exploitation tradeoff
  using UCB1 algorithm to reduce computational costs
- MockEvaluator for testing purposes

Metrics System:
- Metric interface with PointwiseMetric and GlobalMetric implementations
- FBetaMetric for computing precision, recall, and F-beta scores
- Scorer interface with BinaryScorer for trace link classification scoring
- Reductor interface with MeanReductor for aggregating pointwise metrics
- Factory classes (MetricFactory, ScorerFactory, ReductorFactory) for instantiation

Infrastructure Updates:
- Extended ModuleConfiguration with prompt management capabilities
- Added OptimizerConfiguration for optimizer-specific settings
- Enhanced CacheManager with scorer-specific caching via ScorerCacheKey
- Added ChatLanguageModelUtils for LLM interaction helpers
- Updated Statistics class with methods for computing evaluation metrics

Classifier Modifications:
- Used setClassificationPrompt() and getClassificationPromptKey() to Classifier
  abstract class to support dynamic prompt updates
- Replaced cache parameters with map

Testing:
- Added PromptOptimizationUtilsTest with Markdown escaping tests
- Extended Requirement2RequirementE2ETest with optimizer test cases
- Added test resources with expected outputs and cache files for iterative
  and simple optimizer configurations
- Updated ArchitectureTest to ensure correct usage of dangerous overwrite methods

Configuration:
- Added example-configs/optimizer-config.json demonstrating optimizer setup
@DanielDango DanielDango marked this pull request as ready for review November 28, 2025 11:15
@DanielDango DanielDango requested a review from dfuchss as a code owner November 28, 2025 11:15
Copilot AI review requested due to automatic review settings November 28, 2025 11:15
@DanielDango
Copy link
Collaborator Author

@dfuchss, your review is not yet required. I intended to request an initial review from @copilot. I will request your review when the ProTeGi (Gradient) and naive optimizer with feedback of misclassified TLs (Feedback) are added.

Copy link

Copilot AI commented Nov 28, 2025

@DanielDango I've opened a new pull request, #45, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a comprehensive prompt optimization framework to the LiSSA trace link analysis system. The framework enables iterative improvement of classification prompts through automated optimization strategies, integrating metrics, evaluators, and optimizer implementations.

Key Changes:

  • Adds prompt optimizer framework with iterative optimization support
  • Introduces metric system (pointwise and global metrics) for evaluating prompt performance
  • Implements evaluator strategies (brute-force and Upper Confidence Bound bandit algorithms)
  • Extends configuration system to support optimization workflows

Reviewed changes

Copilot reviewed 52 out of 54 changed files in this pull request and generated 28 comments.

Show a summary per file
File Description
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/* Core optimizer interfaces and implementations (IterativeOptimizer, MockOptimizer, OptimizerFactory, utilities)
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptmetric/* Metric system with scorers (BinaryScorer), reductors (MeanReductor), and metric implementations (PointwiseMetric, GlobalMetric, FBetaMetric)
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/evaluator/* Evaluator base class with BruteForceEvaluator and UpperConfidenceBoundBanditEvaluator implementations
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/Optimization.java New optimization pipeline that extends evaluation with prompt optimization step
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/Evaluation.java Extended to support prompt overwriting for optimization workflow
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/configuration/* OptimizerConfiguration record and ModuleConfiguration setArgument methods
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/classifier/* Added getCacheParameters() method and changed copyOf() visibility to public
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/cache/* New ScorerCacheKey and CacheManager enhancements for optimizer caching
src/main/java/edu/kit/kastel/sdq/lissa/cli/command/OptimizeCommand.java CLI command for running optimization and optional evaluation pipelines
src/test/java/edu/kit/kastel/sdq/lissa/ratlr/* Test files including PromptOptimizationUtilsTest and E2E optimizer tests
src/test/resources/warc/* Test configurations and expected outputs for optimizer validation
example-configs/optimizer-config.json Example optimizer configuration file

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

List<Double> scores;
while (true) {
try {
// TODO: Pryzant et al. used multiple threads here
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment "Pryzant et al. used multiple threads here" should be addressed or removed. If multi-threading is planned, create a tracking issue; otherwise, remove the comment if it's not relevant.

Suggested change
// TODO: Pryzant et al. used multiple threads here

Copilot uses AI. Check for mistakes.
* Custom optimization prompts can also use the placeholders {@value SOURCE_TYPE_PLACEHOLDER},
* {@value TARGET_TYPE_PLACEHOLDER} and should use {@value ORIGINAL_PROMPT_PLACEHOLDER}.
* The optimized prompt should be enclosed between {@value PROMPT_START} and {@value PROMPT_END}.
* TODO: There should probably be spaces around the prompt tags :upsidedown:.
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment contains an emoji ":upsidedown:" which should be removed or replaced with a proper description. Consider clarifying whether spaces are truly needed or remove the comment if this is acceptable.

Copilot uses AI. Check for mistakes.
Comment on lines +121 to +126
return getClassificationKey(task.source().toString(), task.target().toString());
}

@NotNull
private static String getClassificationKey(ClassificationResult result) {
return getClassificationKey(result.source().toString(), result.target().toString());
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default toString(): Element inherits toString() from Object, and so is not suitable for printing.

Suggested change
return getClassificationKey(task.source().toString(), task.target().toString());
}
@NotNull
private static String getClassificationKey(ClassificationResult result) {
return getClassificationKey(result.source().toString(), result.target().toString());
// Replace toString() with a stable identifier for source and target, e.g., getId()
return getClassificationKey(task.source().getId(), task.target().getId());
}
@NotNull
private static String getClassificationKey(ClassificationResult result) {
// Replace toString() with a stable identifier for source and target, e.g., getId()
return getClassificationKey(result.source().getId(), result.target().getId());

Copilot uses AI. Check for mistakes.
Comment on lines +121 to +129
return getClassificationKey(task.source().toString(), task.target().toString());
}

@NotNull
private static String getClassificationKey(ClassificationResult result) {
return getClassificationKey(result.source().toString(), result.target().toString());
}

@NotNull
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default toString(): Element inherits toString() from Object, and so is not suitable for printing.

Suggested change
return getClassificationKey(task.source().toString(), task.target().toString());
}
@NotNull
private static String getClassificationKey(ClassificationResult result) {
return getClassificationKey(result.source().toString(), result.target().toString());
}
@NotNull
return getClassificationKey(objectKey(task.source()), objectKey(task.target()));
}
@NotNull
private static String getClassificationKey(ClassificationResult result) {
return getClassificationKey(objectKey(result.source()), objectKey(result.target()));
}
/**
* Returns a stable string key for the given object, avoiding default Object.toString().
* Modify this method to extract a unique and meaningful identifier from the object.
*/
private static String objectKey(Object obj) {
// If obj is null, return "null"
if (obj == null) {
return "null";
}
// If obj has a getId() method, use it. Otherwise, fallback to toString().
// TODO: Replace this with a more robust implementation if possible.
try {
java.lang.reflect.Method getId = obj.getClass().getMethod("getId");
Object id = getId.invoke(obj);
return id != null ? id.toString() : "null";
} catch (Exception e) {
// Fallback: use toString(), but warn in a comment.
// WARNING: This may use Object.toString() if no getId() is present.
return obj.toString();
}
}

Copilot uses AI. Check for mistakes.
Comment on lines +121 to +126
return getClassificationKey(task.source().toString(), task.target().toString());
}

@NotNull
private static String getClassificationKey(ClassificationResult result) {
return getClassificationKey(result.source().toString(), result.target().toString());
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default toString(): Element inherits toString() from Object, and so is not suitable for printing.

Suggested change
return getClassificationKey(task.source().toString(), task.target().toString());
}
@NotNull
private static String getClassificationKey(ClassificationResult result) {
return getClassificationKey(result.source().toString(), result.target().toString());
return getClassificationKey(task.source(), task.target());
}
@NotNull
private static String getClassificationKey(ClassificationResult result) {
return getClassificationKey(result.source(), result.target());

Copilot uses AI. Check for mistakes.
Comment on lines +121 to +126
return getClassificationKey(task.source().toString(), task.target().toString());
}

@NotNull
private static String getClassificationKey(ClassificationResult result) {
return getClassificationKey(result.source().toString(), result.target().toString());
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default toString(): Element inherits toString() from Object, and so is not suitable for printing.

Suggested change
return getClassificationKey(task.source().toString(), task.target().toString());
}
@NotNull
private static String getClassificationKey(ClassificationResult result) {
return getClassificationKey(result.source().toString(), result.target().toString());
// Ensure that source() and target() return meaningful string representations.
Object source = task.source();
Object target = task.target();
String sourceStr = (source instanceof String) ? (String) source : String.valueOf(source);
String targetStr = (target instanceof String) ? (String) target : String.valueOf(target);
return getClassificationKey(sourceStr, targetStr);
}
@NotNull
private static String getClassificationKey(ClassificationResult result) {
// Ensure that source() and target() return meaningful string representations.
Object source = result.source();
Object target = result.target();
String sourceStr = (source instanceof String) ? (String) source : String.valueOf(source);
String targetStr = (target instanceof String) ? (String) target : String.valueOf(target);
return getClassificationKey(sourceStr, targetStr);

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 52 out of 54 changed files in this pull request and generated 10 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

List<Double> scores;
while (true) {
try {
// TODO: Pryzant et al. used multiple threads here
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment about Pryzant et al. using multiple threads should be addressed. If multi-threading is intended to be added, this should be tracked properly. Otherwise, the comment should be removed or clarified.

Suggested change
// TODO: Pryzant et al. used multiple threads here
// Note: Pryzant et al. used multiple threads here. This implementation is currently single-threaded.

Copilot uses AI. Check for mistakes.
});

/**
* Prompts for classifiers should only be modified by optimizers or smetricscorers. Otherwise, there will be
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo or grammatical issue: "smetricscorers" should likely be "metrics scorers" or "metric scorers" with proper spacing.

Copilot uses AI. Check for mistakes.
Comment on lines +145 to +146
* TODO: consider extracting into a aggregator implementation ???
*
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment about extracting this into an aggregator implementation should be addressed or clarified. If this is a design decision that needs to be made, consider opening an issue to track it rather than leaving it as a TODO in the code.

Suggested change
* TODO: consider extracting into a aggregator implementation ???
*
*
*

Copilot uses AI. Check for mistakes.
*/
public Evaluation(Configuration config) throws IOException {
this.configuration = config;
// TODO maybe dont?
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment "maybe dont?" on line 151 should be resolved. It's unclear what the concern is about setting configFile to null. This should either be addressed or the comment should be clarified.

Suggested change
// TODO maybe dont?
// No config file path available when constructed from Configuration object

Copilot uses AI. Check for mistakes.
Comment on lines +179 to +193
return Arrays.stream(ucbScores)
.boxed()
// sort in descending order
.sorted((a, b) -> Double.compare(b, a))
.limit(n)
.map(score -> {
for (int i = 0; i < ucbScores.length; i++) {
if (ucbScores[i] == score) {
return i;
}
}
// should never happen
return -1;
})
.toList();
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The choose method has a bug in finding the indices of top-scoring prompts. After sorting scores in descending order (line 182), the code attempts to find the original index by comparing values (lines 185-188). However, if multiple prompts have the same score, this will always return the first matching index, potentially selecting the same prompt multiple times. Consider using a proper sorting with index tracking, such as creating a list of index-score pairs before sorting.

Suggested change
return Arrays.stream(ucbScores)
.boxed()
// sort in descending order
.sorted((a, b) -> Double.compare(b, a))
.limit(n)
.map(score -> {
for (int i = 0; i < ucbScores.length; i++) {
if (ucbScores[i] == score) {
return i;
}
}
// should never happen
return -1;
})
.toList();
// Pair each score with its index
List<int[]> indexScorePairs = new ArrayList<>();
for (int i = 0; i < ucbScores.length; i++) {
indexScorePairs.add(new int[] {i, Double.isNaN(ucbScores[i]) ? Integer.MIN_VALUE : (int) Math.round(ucbScores[i])});
}
// Sort pairs by score descending
indexScorePairs.sort((a, b) -> Double.compare(ucbScores[b[0]], ucbScores[a[0]]));
// Select the indices of the top n pairs
List<Integer> chosen = new ArrayList<>();
for (int i = 0; i < Math.min(n, indexScorePairs.size()); i++) {
chosen.add(indexScorePairs.get(i)[0]);
}
return chosen;

Copilot uses AI. Check for mistakes.
classifier.setClassificationPrompt(prompt);
List<ClassificationResult> acceptedTraceLinks = new ArrayList<>();
List<ClassificationResult> rejectedTraceLinks = new ArrayList<>();
new Pair<>(new ArrayList<>(), new ArrayList<>());
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 107 creates a new Pair object that is immediately discarded and never used. This appears to be dead code that should be removed.

Suggested change
new Pair<>(new ArrayList<>(), new ArrayList<>());

Copilot uses AI. Check for mistakes.
expectedPrompt.lines().map(String::strip).toList(),
escapedOptimizedPrompt.lines().map(String::strip).toList(),
"The optimized prompt does not match every line (regardless of line terminators or leading/trailing "
+ "spaces) of of the expected prompt.");
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "of of the expected prompt" should be "of the expected prompt" - remove the duplicate "of".

Copilot uses AI. Check for mistakes.

@NotNull
private static String getClassificationKey(ClassificationTask task) {
return getClassificationKey(task.source().toString(), task.target().toString());
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default toString(): Element inherits toString() from Object, and so is not suitable for printing.

Copilot uses AI. Check for mistakes.

@NotNull
private static String getClassificationKey(ClassificationResult result) {
return getClassificationKey(result.source().toString(), result.target().toString());
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default toString(): Element inherits toString() from Object, and so is not suitable for printing.

Copilot uses AI. Check for mistakes.
DanielDango and others added 15 commits November 28, 2025 13:16
 - refines iterative optimizer with feedback on misclassified trace links (false positive or false negative)
 - add sample strategy to configure which TLs will be used as feedback
 - add SLF4J Simple Logger configuration for improved logging in tests
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
… feature/add-prompt-optimization-module

# Conflicts:
#	src/main/java/edu/kit/kastel/sdq/lissa/ratlr/classifier/ChatLanguageModelProvider.java
…timization with "Gradient Descent" and Beam Search" of Pryzant et al. 2023
@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 23, 2025

Quality Gate Passed Quality Gate passed

Issues
25 New issues
1 Accepted issue

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Copy link

Copilot AI commented Dec 23, 2025

@DanielDango I've opened a new pull request, #49, to work on those changes. Once the pull request is ready, I'll request review from you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants