Skip to content

Conversation

@riccardosavorgnan
Copy link
Collaborator

  • first working version
  • made it a configurable option

@riccardosavorgnan riccardosavorgnan marked this pull request as ready for review January 11, 2026 22:21
@greptile-apps
Copy link

greptile-apps bot commented Jan 11, 2026

Greptile Overview

Greptile Summary

This PR implements speed-based collision reward scaling for the driving simulation environment. When enabled via the new speed_based_collisions_reward configuration option, collision penalties are scaled based on the agent's velocity using a sigmoid function—vehicles moving faster receive larger penalties than slower vehicles.

Core Implementation:

  • Adds compute_collision_reward() function that calculates vehicle speed from velocity components (vx, vy) and applies sigmoid scaling
  • The sigmoid is calibrated so that 0 m/s yields ~50% penalty and 10 m/s yields ~95% penalty
  • New configuration option speed_based_collisions_reward (0=disabled, 1=enabled) threads through Python → C binding → environment struct
  • Modified c_step() to conditionally apply scaled vs. flat collision rewards

Integration:
The feature is properly integrated across the codebase: configuration parsing (env_config.h), Python interface (drive.py), C binding (binding.c), visualization (visualize.c), and demo code (drive.c). All files correctly propagate the new flag.

Issues Found:

  • Critical: Integer division bug in speed normalization (line 2018 of drive.h) breaks the sigmoid calibration
  • Breaking: Deletion of puffer_drive_weights.bin will break existing tests and code references
  • Unrelated: Config change reducing num_maps from 10000 to 1000 appears unintentional

Confidence Score: 1/5

  • Not safe to merge - contains critical math bug and breaking changes
  • The PR has a critical integer division bug that breaks the core feature's speed normalization logic. Additionally, two unrelated changes (weights file deletion and num_maps reduction) will break existing tests and potentially training pipelines. The speed-based collision reward feature itself is well-integrated across the codebase, but the math error means it won't work as intended.
  • pufferlib/ocean/drive/drive.h (fix integer division), pufferlib/config/ocean/drive.ini (revert num_maps), pufferlib/resources/drive/puffer_drive_weights.bin (restore file or update references)

Important Files Changed

File Analysis

Filename Score Overview
pufferlib/ocean/drive/drive.h 1/5 Adds sigmoid function and speed-based collision reward calculation. Critical integer division bug at line 2018 causes incorrect speed normalization.
pufferlib/ocean/drive/drive.c 5/5 Adds speed_based_collisions_reward field initialization in demo() function. No issues found.
pufferlib/ocean/drive/drive.py 5/5 Adds speed_based_collisions_reward parameter to Drive class constructor and passes it through to C environment. No issues found.
pufferlib/ocean/drive/binding.c 5/5 Adds speed_based_collisions_reward assignment from config to environment. No issues found.
pufferlib/ocean/drive/visualize.c 5/5 Adds speed_based_collisions_reward field initialization in eval_gif() function. No issues found.
pufferlib/ocean/env_config.h 5/5 Adds speed_based_collisions_reward field to env_init_config struct and INI parser handler. No issues found.
pufferlib/config/ocean/drive.ini 2/5 Enables speed_based_collisions_reward feature. Includes unrelated num_maps reduction from 10000 to 1000.
pufferlib/resources/drive/puffer_drive_weights.bin 0/5 Binary weights file deleted. This file is referenced by tests and other code, deletion will cause breakage.

Sequence Diagram

sequenceDiagram
    participant Config as drive.ini
    participant Python as drive.py
    participant Binding as binding.c
    participant Env as Drive struct
    participant Step as c_step()
    participant Compute as compute_collision_reward()
    
    Config->>Python: speed_based_collisions_reward=1
    Python->>Binding: Initialize environment
    Binding->>Env: Set speed_based_collisions_reward flag
    
    loop Each simulation step
        Step->>Step: Detect vehicle collision
        alt speed_based_collisions_reward == 1
            Step->>Compute: compute_collision_reward(env, agent_idx)
            Compute->>Compute: Calculate current_speed from vx, vy
            Compute->>Compute: rescaled_speed = current_speed / (10/3)
            Note over Compute: BUG: Integer division 10/3=3
            Compute->>Compute: sigmoid(rescaled_speed)
            Compute->>Step: Return scaled collision_reward
            Step->>Env: Apply scaled reward
        else speed_based_collisions_reward == 0
            Step->>Env: Apply flat collision_reward
        end
    end
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

float collision_reward = env->reward_vehicle_collision;

// normalize speed such that 0 m/s gets a 50% of the reward and it gets 95% of the reward at 10 m/s
float rescaled_speed = current_speed / (10 / 3); // rescale so that 10 becomes 3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P0] Integer division bug: 10 / 3 evaluates to 3 (integer division), not 3.333... as intended. This breaks the speed normalization—at 10 m/s the rescaled_speed becomes 10 / 3 = 3.0 as expected, but the calculation is performed using integer division. Change to 10.0f / 3.0f for correct floating-point division.

Prompt To Fix With AI
This is a comment left during a code review.
Path: pufferlib/ocean/drive/drive.h
Line: 2018:2018

Comment:
[P0] Integer division bug: `10 / 3` evaluates to `3` (integer division), not `3.333...` as intended. This breaks the speed normalization—at 10 m/s the rescaled_speed becomes `10 / 3 = 3.0` as expected, but the calculation is performed using integer division. Change to `10.0f / 3.0f` for correct floating-point division.

How can I resolve this? If you propose a fix, please make it concise.

env->entities[agent_idx].steering_angle = 0.0f;
}

double sigmoid(float x) { return 1.0 / (1.0 + exp(-x)); }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] Type inconsistency: Function returns double but is used in float contexts. Consider changing return type to float or cast the parameter x to double for consistency. Current usage causes implicit conversions.

Prompt To Fix With AI
This is a comment left during a code review.
Path: pufferlib/ocean/drive/drive.h
Line: 2006:2006

Comment:
[P2] Type inconsistency: Function returns `double` but is used in `float` contexts. Consider changing return type to `float` or cast the parameter `x` to `double` for consistency. Current usage causes implicit conversions.

How can I resolve this? If you propose a fix, please make it concise.

termination_mode = 1 # 0 - terminate at episode_length, 1 - terminate after all agents have been reset
map_dir = "resources/drive/binaries/training"
num_maps = 10000
num_maps = 1000
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Unrelated change: num_maps reduced from 10000 to 1000. This change appears unrelated to the speed-based collision reward feature described in the PR. If intentional, should be in a separate commit or documented in the PR description.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: pufferlib/config/ocean/drive.ini
Line: 52:52

Comment:
[P1] Unrelated change: `num_maps` reduced from 10000 to 1000. This change appears unrelated to the speed-based collision reward feature described in the PR. If intentional, should be in a separate commit or documented in the PR description.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

@greptile-apps
Copy link

greptile-apps bot commented Jan 11, 2026

Additional Comments (1)

pufferlib/resources/drive/puffer_drive_weights.bin
[P1] Unrelated deletion: This 2.4MB weights file is referenced by multiple files (tests/test_drive_render.py, pufferlib/utils.py, pufferlib/ocean/drive/visualize.c, pufferlib/ocean/drive/drive.c). Deleting it will break tests and code that depend on it. This change appears unrelated to the speed-based collision feature.

Prompt To Fix With AI
This is a comment left during a code review.
Path: pufferlib/resources/drive/puffer_drive_weights.bin
Line: 1:1

Comment:
[P1] Unrelated deletion: This 2.4MB weights file is referenced by multiple files (`tests/test_drive_render.py`, `pufferlib/utils.py`, `pufferlib/ocean/drive/visualize.c`, `pufferlib/ocean/drive/drive.c`). Deleting it will break tests and code that depend on it. This change appears unrelated to the speed-based collision feature.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants