Skip to content

Conversation

@jatinganhotra
Copy link

This PR adds 2 new submissions for our iSWE-Agent - one each for Java subset of the full SWE-Polybench and SWE-Polybench-Verified benchmarks. iSWE-Agent is a multi-agent system developed by IBM Research to tackle software engineering tasks and the latest release of iSWE focuses on Java development.

We are excited to submit the evaluation results on the java/full and java/verified splits. This submission follows all the official leaderboard guidelines.

While we expand iSWE-Agent to all languages and resubmit in the future, we would appreciate it if the leaderboard UI would show - or empty whitespace for iSWE-Agent for the overall score and languages other than Java, instead of a misleading lower overall score or0.0 score for iSWE-Agent.

We noticed that the SWE-PolyBench leaderboard follows a different approach from Multi-SWE-Bench (MSB). On the MSB leaderboard, there have been individual submissions for individual languages: Java (ours), C and TypeScript (RepoRepair) and C++ (InfCode). This option is not available on the SWE-PolyBench leaderboard.

Results

Our submission achieves the following results:

  • java/verified split: 32/69 (46.4%)
  • java/full split: 55/165 (33.3%)

Thanks @mshihabr @bocchris-aws for maintaining! Please let us know if any further information or modifications are needed. We look forward to seeing iSWE-Agent on the leaderboard!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants