iSWE-Agent submission for SWE-Polybench full and SWE-Polybench-Verified Java splits #20
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds 2 new submissions for our
iSWE-Agent- one each for Java subset of the full SWE-Polybench and SWE-Polybench-Verified benchmarks. iSWE-Agent is a multi-agent system developed by IBM Research to tackle software engineering tasks and the latest release of iSWE focuses on Java development.We are excited to submit the evaluation results on the java/full and java/verified splits. This submission follows all the official leaderboard guidelines.
While we expand iSWE-Agent to all languages and resubmit in the future, we would appreciate it if the leaderboard UI would show
-or empty whitespaceforiSWE-Agentfor the overall score and languages other than Java, instead of a misleading lower overall score or0.0score foriSWE-Agent.We noticed that the SWE-PolyBench leaderboard follows a different approach from Multi-SWE-Bench (MSB). On the MSB leaderboard, there have been individual submissions for individual languages: Java (ours), C and TypeScript (RepoRepair) and C++ (InfCode). This option is not available on the SWE-PolyBench leaderboard.
Results
Our submission achieves the following results:
Thanks @mshihabr @bocchris-aws for maintaining! Please let us know if any further information or modifications are needed. We look forward to seeing iSWE-Agent on the leaderboard!