Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
148 changes: 148 additions & 0 deletions ANALYSIS_SUMMARY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
# Analysis Summary: Code Slowdown After Upstream Sync

## Question Asked
> "Look at the differences before and after I synced with upstream (ci-group), and tell me what possible reasons for code slowdown there could be for my examples/A3_modified.py"

## Answer

### Primary Cause: Duration Parameter Ambiguity in `continue_simple_runner()`

The upstream sync added a new function `continue_simple_runner()` to `src/ariel/utils/runners.py` that has an **ambiguous duration parameter**.

#### The Problem

```python
def continue_simple_runner(model, data, duration=10.0, steps_per_loop=100):
while data.time < duration: # ⚠️ Is this absolute time or incremental?
mujoco.mj_step(model, data, nstep=steps_per_loop)
```

Your configuration has:
```python
duration_flat: int = 15 # First segment
duration_rugged: int = 30 # Second segment
duration_elevated: int = 55 # Third segment
```

#### Two Possible Interpretations

**Interpretation 1: Absolute Times** (current implementation)
- Simulation runs until `data.time` reaches the specified value
- Total time: 15 + (30-15) + (55-30) = **55 seconds**

**Interpretation 2: Incremental Durations** (possible original intent)
- Simulation adds the specified duration to current time
- Total time: 15 + 30 + 55 = **100 seconds**

#### Impact: 82% Difference!

```
Current behavior: [====55 seconds====]
Possible intent: [==============100 seconds==============]
Missing time: [====45 seconds====]
```

### Why This Causes "Slowdown"

If your code expects 100s simulations but gets 55s:

1. **Robots travel shorter distances** → appear less capable
2. **Fitness scores are lower** → different selection in EA
3. **Fewer robots pass checkpoints** → different evolutionary pressure
4. **EA makes different progress** → appears to "slow down" or stagnate
5. **Results don't match expectations** → confusion about performance

### How to Fix

**Option 1: Make Duration Incremental** (if 100s is correct)

Edit `src/ariel/utils/runners.py`, line 64:

```python
def continue_simple_runner(model, data, duration=10.0, steps_per_loop=100):
"""Continue a simple headless simulation for an additional duration."""
end_time = data.time + duration # ← Add this line
while data.time < end_time: # ← Change this line
mujoco.mj_step(model, data, nstep=steps_per_loop)
```

**Option 2: Update Config** (if 55s is correct)

Rename variables in `examples/A3_modified.py` for clarity:

```python
duration_initial: int = 15 # 0 → 15s
total_time_after_rugged: int = 30 # 0 → 30s total
total_time_after_elevated: int = 55 # 0 → 55s total
```

### How to Verify Which Is Correct

1. **Run the test:**
```bash
python test_duration_semantics.py
```

2. **Check your expectations:**
- Should simulations be 55s or 100s total?
- Are checkpoint locations reasonable for 55s vs 100s?
- What are historical fitness value ranges?

3. **Monitor actual behavior:**
- Add logging to see actual simulation times
- Check checkpoint pass rates
- Compare fitness distributions

### Files Created for This Analysis

| File | Purpose |
|------|---------|
| `README_ANALYSIS.md` | Master index and overview |
| `QUICKSTART_ANALYSIS.md` | Quick summary with fixes |
| `SLOWDOWN_ANALYSIS.md` | Detailed technical analysis |
| `TIMELINE_VISUALIZATION.txt` | Visual timeline comparison |
| `test_duration_semantics.py` | Runnable demonstration |
| `ANALYSIS_SUMMARY.md` | This file - executive summary |

### Recommendation

**Most Likely:** The duration parameters were meant to be incremental (100s total), so:

1. ✅ Apply Option 1 fix to `src/ariel/utils/runners.py`
2. ✅ Test with a few EA runs
3. ✅ Verify fitness values return to expected ranges
4. ✅ Confirm checkpoint pass rates improve

### Additional Minor Issues

While analyzing, I also found these minor issues (not primary causes of slowdown):

1. **No controller state reset** in `continue_simple_runner`
- Impact: Minimal, controller stays active
- Urgency: Low

2. **Checkpoint timing assumptions**
- Impact: Depends on duration interpretation
- Fix: Will be correct once duration is fixed

### Upstream Changes

The merge commit `b465e55` ("Merge remote-tracking branch 'upstream/main'") added:
- New `continue_simple_runner()` function
- Many infrastructure files (`.devcontainer/`, `.typings/`, etc.)
- No direct changes to your `A3_modified.py` (it's your custom file)

### Root Cause Analysis

This issue is a classic example of:
- **Ambiguous parameter naming**: `duration` could mean different things
- **Missing documentation**: No clear spec of absolute vs incremental
- **Silent behavior change**: No error, just different behavior

Better API design would use:
- `target_time` for absolute times
- `additional_duration` for incremental times

---

**Bottom Line:** The most likely cause of your code "slowdown" is that simulations are running for 55 seconds instead of the expected 100 seconds, due to how the new `continue_simple_runner()` function interprets duration parameters. The fix is a simple 2-line change to make durations incremental instead of absolute.
129 changes: 129 additions & 0 deletions QUICKSTART_ANALYSIS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# Summary: Possible Reasons for Code Slowdown in A3_modified.py

After analyzing the differences before and after syncing with upstream (ci-group), here are the **possible reasons for code slowdown** in `examples/A3_modified.py`:

## 🎯 Primary Issue: Duration Semantics Ambiguity

The upstream sync added a new `continue_simple_runner()` function that has an **ambiguous duration parameter**:

```python
def continue_simple_runner(model, data, duration=10.0, steps_per_loop=100):
while data.time < duration: # ⚠️ Treats duration as ABSOLUTE time
mujoco.mj_step(model, data, nstep=steps_per_loop)
```

### The Problem

Your configuration has these values:
```python
duration_flat: int = 15
duration_rugged: int = 30
duration_elevated: int = 55
```

**Question**: Are these meant to be:
1. **Absolute times** (current behavior): Run until simulation reaches this time
2. **Incremental durations** (possible original intent): Add this much time

### Impact on Simulation Time

| Interpretation | Flat | Rugged (added) | Elevated (added) | **TOTAL** |
|----------------|------|----------------|------------------|-----------|
| **Absolute** (current) | 0→15s (15s) | 15→30s (15s) | 30→55s (25s) | **55 seconds** |
| **Incremental** (possible) | 0→15s (15s) | 15→45s (30s) | 45→100s (55s) | **100 seconds** |

**Difference: 45 seconds (82% increase!)**

## 🔍 Why This Causes "Slowdown"

If your original implementation (before sync) used **incremental durations**, and the upstream version uses **absolute times**, then:

1. ⏱️ **Simulations run for less time**: 55s instead of 100s per robot
2. 🏃 **Robots travel shorter distances**: Less time to reach checkpoints/targets
3. 📉 **Lower fitness scores**: Less distance covered = worse fitness
4. 🎯 **Fewer checkpoint passes**: Robots might not reach rugged/elevated terrains
5. 🔄 **Different EA behavior**: Selection pressure changes, different solutions emerge
6. ⚡ **Perceived "slowdown"**: EA not making progress as expected

## 🧪 How to Verify

Run the test script to see the difference:

```bash
python test_duration_semantics.py
```

This will show you exactly how the two interpretations differ.

## ✅ Recommended Fix

### Option 1: Change to Incremental Duration (Recommended if original behavior was incremental)

**File**: `src/ariel/utils/runners.py`

```python
def continue_simple_runner(
model: mujoco.MjModel,
data: mujoco.MjData,
duration: float = 10.0,
steps_per_loop: int = 100,
) -> None:
"""Continue a simple headless simulation for an additional duration."""
end_time = data.time + duration # ✓ Make it incremental
while data.time < end_time:
mujoco.mj_step(model, data, nstep=steps_per_loop)
```

**And update config to be clear**:
```python
duration_flat: int = 15 # Initial simulation time
duration_rugged_extra: int = 30 # Additional time for rugged (renamed for clarity)
duration_elevated_extra: int = 55 # Additional time for elevated (renamed for clarity)
```

### Option 2: Keep Absolute Times but Fix Config Values

If absolute times are correct, update config to match:

```python
duration_flat: int = 15 # Run until time = 15
total_time_rugged: int = 30 # Run until time = 30 (15s added)
total_time_elevated: int = 55 # Run until time = 55 (25s added)
```

### Option 3: Restore Original Behavior

If you know what the original implementation was before the sync, you could restore that behavior.

## 📋 Quick Check

To determine which interpretation is correct, check:

1. **Are robots reaching checkpoints?**
- If very few robots pass checkpoints → likely running for too short

2. **What are typical fitness values?**
- Compare with results before the sync

3. **How long should simulations actually run?**
- Do you expect robots to have 55s or 100s to complete the course?

## 📄 Full Details

See `SLOWDOWN_ANALYSIS.md` for complete technical analysis including:
- Detailed code comparison
- Additional potential issues
- Multiple fix options
- Testing recommendations

## 🚀 Next Steps

1. **Determine intended behavior**: Should durations be incremental or absolute?
2. **Choose a fix option**: Based on your determination
3. **Update code/config**: Implement the chosen fix
4. **Test**: Verify robots behave as expected
5. **Compare results**: Check if EA performance is restored

---

**Bottom Line**: The duration parameter ambiguity in `continue_simple_runner` is the most likely cause of unexpected behavior after syncing with upstream. The fix is simple once you determine the intended semantics.
Loading