diff --git a/NVIDIA_Assets/swe-bench.png b/NVIDIA_Assets/swe-bench.png new file mode 100644 index 000000000..f9a5ea735 Binary files /dev/null and b/NVIDIA_Assets/swe-bench.png differ diff --git a/README.md b/README.md index fe4a14f9f..dc3532490 100644 --- a/README.md +++ b/README.md @@ -209,7 +209,8 @@ More module READMEs (click to open): - [`tests/nvidia/README.md`](tests/nvidia/README.md) ## 💡 Current Results -learning curve -To validate the functionality of the ProRLAgent servers, we conducted proof-of-concept experiments on software engineering (SWE) tasks by integrating the server with the Verl reinforcement learning (RL) framework. Specifically, we used swe-gym along with a subset of the R2E-gym dataset, comprising a total of 800 training instances, to perform GRPO training. Our experiments were carried out on the Qwen3-4B-Instruct model and evaluated on the SWE-Bench-Verified benchmark. The results demonstrate a performance improvement, with accuracy increasing from **15.0%** to **20.4%**. + +To validate the functionality of the ProRLAgent servers, we conducted experiments on software engineering (SWE) tasks by integrating the server with our ProRLAgent Training framework based on verl. We did some initial RL training on Qwen3-4B-Instruct-2507 model. We used 32 A100 GPUs to train the model. Our training data is a subset of [SWE-GYM](https://huggingface.co/datasets/NovaSky-AI/SkyRL-v0-293-data) with 293 training examples. Training for around 66 steps have allowed the Pass@1 on SWE-Bench-Verified to be improved from 14.2% to 20.8%,the following charts shows the test results on SWE-Bench-Verified. It increases during training. +swe-bench curve