-
Notifications
You must be signed in to change notification settings - Fork 15
rescaled collision reward based on vehicle speed #251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 3.0
Are you sure you want to change the base?
Conversation
riccardosavorgnan
commented
Jan 11, 2026
- first working version
- made it a configurable option
Greptile OverviewGreptile SummaryThis PR implements speed-based collision reward scaling for the driving simulation environment. When enabled via the new Core Implementation:
Integration: Issues Found:
Confidence Score: 1/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant Config as drive.ini
participant Python as drive.py
participant Binding as binding.c
participant Env as Drive struct
participant Step as c_step()
participant Compute as compute_collision_reward()
Config->>Python: speed_based_collisions_reward=1
Python->>Binding: Initialize environment
Binding->>Env: Set speed_based_collisions_reward flag
loop Each simulation step
Step->>Step: Detect vehicle collision
alt speed_based_collisions_reward == 1
Step->>Compute: compute_collision_reward(env, agent_idx)
Compute->>Compute: Calculate current_speed from vx, vy
Compute->>Compute: rescaled_speed = current_speed / (10/3)
Note over Compute: BUG: Integer division 10/3=3
Compute->>Compute: sigmoid(rescaled_speed)
Compute->>Step: Return scaled collision_reward
Step->>Env: Apply scaled reward
else speed_based_collisions_reward == 0
Step->>Env: Apply flat collision_reward
end
end
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 files reviewed, 4 comments
pufferlib/ocean/drive/drive.h
Outdated
| float collision_reward = env->reward_vehicle_collision; | ||
|
|
||
| // normalize speed such that 0 m/s gets a 50% of the reward and it gets 95% of the reward at 10 m/s | ||
| float rescaled_speed = current_speed / (10 / 3); // rescale so that 10 becomes 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P0] Integer division bug: 10 / 3 evaluates to 3 (integer division), not 3.333... as intended. This breaks the speed normalization—at 10 m/s the rescaled_speed becomes 10 / 3 = 3.0 as expected, but the calculation is performed using integer division. Change to 10.0f / 3.0f for correct floating-point division.
Prompt To Fix With AI
This is a comment left during a code review.
Path: pufferlib/ocean/drive/drive.h
Line: 2018:2018
Comment:
[P0] Integer division bug: `10 / 3` evaluates to `3` (integer division), not `3.333...` as intended. This breaks the speed normalization—at 10 m/s the rescaled_speed becomes `10 / 3 = 3.0` as expected, but the calculation is performed using integer division. Change to `10.0f / 3.0f` for correct floating-point division.
How can I resolve this? If you propose a fix, please make it concise.
pufferlib/ocean/drive/drive.h
Outdated
| env->entities[agent_idx].steering_angle = 0.0f; | ||
| } | ||
|
|
||
| double sigmoid(float x) { return 1.0 / (1.0 + exp(-x)); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P2] Type inconsistency: Function returns double but is used in float contexts. Consider changing return type to float or cast the parameter x to double for consistency. Current usage causes implicit conversions.
Prompt To Fix With AI
This is a comment left during a code review.
Path: pufferlib/ocean/drive/drive.h
Line: 2006:2006
Comment:
[P2] Type inconsistency: Function returns `double` but is used in `float` contexts. Consider changing return type to `float` or cast the parameter `x` to `double` for consistency. Current usage causes implicit conversions.
How can I resolve this? If you propose a fix, please make it concise.
pufferlib/config/ocean/drive.ini
Outdated
| termination_mode = 1 # 0 - terminate at episode_length, 1 - terminate after all agents have been reset | ||
| map_dir = "resources/drive/binaries/training" | ||
| num_maps = 10000 | ||
| num_maps = 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P1] Unrelated change: num_maps reduced from 10000 to 1000. This change appears unrelated to the speed-based collision reward feature described in the PR. If intentional, should be in a separate commit or documented in the PR description.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: pufferlib/config/ocean/drive.ini
Line: 52:52
Comment:
[P1] Unrelated change: `num_maps` reduced from 10000 to 1000. This change appears unrelated to the speed-based collision reward feature described in the PR. If intentional, should be in a separate commit or documented in the PR description.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
Additional Comments (1)
Prompt To Fix With AIThis is a comment left during a code review.
Path: pufferlib/resources/drive/puffer_drive_weights.bin
Line: 1:1
Comment:
[P1] Unrelated deletion: This 2.4MB weights file is referenced by multiple files (`tests/test_drive_render.py`, `pufferlib/utils.py`, `pufferlib/ocean/drive/visualize.c`, `pufferlib/ocean/drive/drive.c`). Deleting it will break tests and code that depend on it. This change appears unrelated to the speed-based collision feature.
How can I resolve this? If you propose a fix, please make it concise. |