Skip to content

Conversation

@HeshamHM28
Copy link

@HeshamHM28 HeshamHM28 commented Jul 16, 2025

📄 128,030% (1,280.30x) speedup for unique_to_symmetric in src/condor/backends/casadi/__init__.py

⏱️ Runtime : 2.90 seconds 2.26 milliseconds (best of 5 runs)

📝 Explanation and details

Here’s an optimized version of your code, focused on replacing the very slow Python for-loop and double-loop assignment with vectorized NumPy indexing, minimizing creation of lists, and refactoring the symbolic-path list-comprehensions. Comments are preserved unless code was modified.

Optimizations:

  • Vectorized symmetric assignment: Eliminated slow Python loop, using NumPy advanced indexing to fill both lower and upper triangle in one shot.
  • Single list comprehension for symbolic: The expensive Python list-building (.tolist()) is only used for the symbolic path, and is made as flat as possible.
  • Removed unnecessary conversion: .tolist() is not needed unless used for casadi.hcat, kept only for that section.

Result:
This should drastically improve speed, as most of the time was in the explicit for-loop and per-element assignment, now replaced by a few fast-indexed vectorized assignments.

Note:
If unique can ever have more than 2 dimensions, adjust the indexing as needed (but in common use, the above should work). If symbolic matrices are always scalar, this is fully correct. For batch symbolic, adjustments may be needed.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 27 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import casadi
import numpy as np
# imports
import pytest  # used for our unit tests
from src.condor.backends.casadi.__init__ import unique_to_symmetric

# =========================
# Unit Tests for unique_to_symmetric
# =========================

# -------- BASIC TEST CASES --------




def test_2x2_matrix_numeric():
    # Test with numeric input, symbolic=False
    unique = np.array([[1], [2], [3]])
    codeflash_output = unique_to_symmetric(unique, symbolic=False); result = codeflash_output # 60.2μs -> 123μs (51.1% slower)

def test_3x3_matrix_numeric():
    # Test with numeric input, symbolic=False
    unique = np.array([[1], [2], [3], [4], [5], [6]])
    codeflash_output = unique_to_symmetric(unique, symbolic=False); result = codeflash_output # 65.5μs -> 50.6μs (29.6% faster)

# -------- EDGE TEST CASES --------


def test_non_square_number_of_elements():
    # Test with a number of elements that cannot form a symmetric matrix
    unique = casadi.MX([1, 2, 3, 4])  # 4 elements can't form tril of square matrix
    with pytest.raises(ValueError):
        # The function will calculate n, but n won't be integer for 4 elements
        # So we check that it fails in some way (e.g., shape mismatch)
        n = (np.sqrt(1 + 8 * 4) - 1) / 2
        if not n.is_integer():
            raise ValueError("Cannot form symmetric matrix with 4 elements")
        unique_to_symmetric(unique)

def test_single_column_vs_multi_column_numeric():
    # Test with numeric input where unique has shape (n, 1) and (n, k)
    unique = np.array([[1], [2], [3]])
    codeflash_output = unique_to_symmetric(unique, symbolic=False); result = codeflash_output # 66.1μs -> 106μs (37.9% slower)
    # Now test with multi-column input
    unique2 = np.array([[1, 10], [2, 20], [3, 30]])
    codeflash_output = unique_to_symmetric(unique2, symbolic=False); result2 = codeflash_output # 28.3μs -> 33.0μs (14.4% slower)

def test_symbolic_false_with_single_value():
    # Test with numeric input, single value (1x1 matrix)
    unique = np.array([[42]])
    codeflash_output = unique_to_symmetric(unique, symbolic=False); result = codeflash_output # 45.8μs -> 43.1μs (6.32% faster)


def test_symbolic_false_with_column_vector():
    # Test with numeric input as column vector (shape (n, 1))
    unique = np.array([[1], [2], [3]])
    codeflash_output = unique_to_symmetric(unique, symbolic=False); result = codeflash_output # 130μs -> 125μs (3.85% faster)

def test_symbolic_false_with_row_vector():
    # Test with numeric input as row vector (shape (n,))
    unique = np.array([1, 2, 3])
    # Reshape to (n, 1) to match expected input
    unique = unique.reshape(-1, 1)
    codeflash_output = unique_to_symmetric(unique, symbolic=False); result = codeflash_output # 63.6μs -> 47.1μs (35.2% faster)



def test_large_50x50_matrix_numeric():
    # Test with large numeric input (50x50 symmetric matrix)
    n = 50
    tril_len = n * (n + 1) // 2
    unique = np.arange(tril_len).reshape(-1, 1)
    codeflash_output = unique_to_symmetric(unique, symbolic=False); result = codeflash_output # 3.32ms -> 140μs (2266% faster)

def test_large_20x20_matrix_multicolumn_numeric():
    # Test with large numeric input with multiple columns (20x20x5)
    n = 20
    k = 5
    tril_len = n * (n + 1) // 2
    unique = np.tile(np.arange(tril_len).reshape(-1, 1), (1, k))
    codeflash_output = unique_to_symmetric(unique, symbolic=False); result = codeflash_output # 219μs -> 64.8μs (239% faster)
    # Check symmetry and correct values for each "layer"
    for layer in range(k):
        pass


def test_symbolic_false_with_non_integer_elements():
    # Test with float values
    unique = np.array([[1.5], [2.5], [3.5]])
    codeflash_output = unique_to_symmetric(unique, symbolic=False); result = codeflash_output # 146μs -> 125μs (16.2% faster)



def test_symbolic_false_with_large_values():
    # Test with large numeric values (non-symbolic)
    unique = np.array([[1e10], [2e10], [3e10]])
    codeflash_output = unique_to_symmetric(unique, symbolic=False); result = codeflash_output # 65.8μs -> 123μs (46.8% slower)


def test_symbolic_false_with_negative_values():
    # Test with negative values (non-symbolic)
    unique = np.array([[-1], [-2], [-3]])
    codeflash_output = unique_to_symmetric(unique, symbolic=False); result = codeflash_output # 64.2μs -> 119μs (46.2% slower)


def test_symbolic_false_with_zero_values():
    # Test with zeros (non-symbolic)
    unique = np.array([[0], [0], [0]])
    codeflash_output = unique_to_symmetric(unique, symbolic=False); result = codeflash_output # 60.3μs -> 123μs (51.3% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import casadi
import numpy as np
# imports
import pytest  # used for our unit tests
from src.condor.backends.casadi.__init__ import unique_to_symmetric

# unit tests

######################
# 1. Basic Test Cases
######################




def test_2x2_matrix_numeric():
    # Test 2x2 symmetric matrix, numeric (symbolic=False)
    unique = np.array([[1], [2], [3]])
    codeflash_output = unique_to_symmetric(unique, symbolic=False); result = codeflash_output # 61.0μs -> 120μs (49.4% slower)

def test_3x3_matrix_numeric():
    # Test 3x3 symmetric matrix, numeric (symbolic=False)
    unique = np.array([[1], [2], [3], [4], [5], [6]])
    codeflash_output = unique_to_symmetric(unique, symbolic=False); result = codeflash_output # 64.2μs -> 50.3μs (27.5% faster)
    expected = np.array([[1, 2, 4], [2, 3, 5], [4, 5, 6]])

def test_3x3_matrix_numeric_multicolumn():
    # Test 3x3 symmetric matrix, numeric, with 2 columns per entry
    unique = np.array([[1, 10], [2, 20], [3, 30], [4, 40], [5, 50], [6, 60]])
    codeflash_output = unique_to_symmetric(unique, symbolic=False); result = codeflash_output # 55.2μs -> 49.1μs (12.3% faster)

######################
# 2. Edge Test Cases
######################


def test_minimal_numeric_input():
    # Test minimal numeric input (1x1)
    unique = np.array([[7]])
    codeflash_output = unique_to_symmetric(unique, symbolic=False); result = codeflash_output # 52.0μs -> 122μs (57.4% slower)


def test_large_multicolumn_numeric():
    # Test large matrix with multicolumn numeric input, edge of allowed size
    n = 44  # 44*45/2 = 990 < 1000
    num_cols = 3
    unique = np.arange(1, n*(n+1)//2 + 1).reshape(-1, 1) * np.ones((1, num_cols))
    codeflash_output = unique_to_symmetric(unique, symbolic=False); result = codeflash_output # 728μs -> 137μs (431% faster)
    # Check symmetry for a few random indices
    for i, j in [(0, 10), (10, 0), (20, 21), (21, 20), (n-1, n-2), (n-2, n-1)]:
        pass


def test_symbolic_with_non_scalar():
    # Test with symbolic input that has more than one column (should fail)
    unique = casadi.MX(np.array([[1, 2], [3, 4], [5, 6]]))
    with pytest.raises(Exception):
        unique_to_symmetric(unique)

######################
# 3. Large Scale Test Cases
######################


def test_large_numeric_matrix():
    # Test large numeric matrix (n=44)
    n = 44
    unique = np.arange(1, n*(n+1)//2 + 1).reshape(-1, 1)
    codeflash_output = unique_to_symmetric(unique, symbolic=False); result = codeflash_output # 2.77ms -> 140μs (1867% faster)
    # Check diagonal and symmetry
    for i in [0, n//2, n-1]:
        pass
    for i, j in [(0, n-1), (n-1, 0), (10, 20), (20, 10)]:
        pass

def test_large_numeric_multicolumn_matrix():
    # Test large numeric matrix with multiple columns per entry (n=31, 496 unique, 5 columns)
    n = 31
    num_cols = 5
    unique = np.arange(1, n*(n+1)//2 + 1).reshape(-1, 1) * np.ones((1, num_cols))
    codeflash_output = unique_to_symmetric(unique, symbolic=False); result = codeflash_output # 377μs -> 74.2μs (409% faster)
    # Check a few entries for symmetry and correct values
    for i, j in [(0, 30), (30, 0), (15, 16), (16, 15)]:
        pass
    # Check a diagonal value
    idx = (15*(15+1))//2 + 15

######################
# Additional Edge Cases
######################

def test_non_integer_sqrt():
    # Test with input whose length does not correspond to a triangular number
    unique = casadi.MX([1, 2, 3, 4])  # 4 is not a triangular number
    with pytest.raises(Exception):
        unique_to_symmetric(unique)

def test_input_with_negative_values():
    # Test input with negative values
    unique = np.array([[-1], [-2], [-3]])
    codeflash_output = unique_to_symmetric(unique, symbolic=False); result = codeflash_output # 81.2μs -> 46.3μs (75.3% faster)
    expected = np.array([[-1, -2], [-2, -3]])

def test_input_with_zeros():
    # Test input with zeros
    unique = np.zeros((3, 1))
    codeflash_output = unique_to_symmetric(unique, symbolic=False); result = codeflash_output # 64.3μs -> 43.5μs (47.9% faster)

def test_input_with_large_numbers():
    # Test input with very large numbers
    unique = np.array([[1e10], [2e10], [3e10]])
    codeflash_output = unique_to_symmetric(unique, symbolic=False); result = codeflash_output # 57.4μs -> 42.2μs (36.1% faster)
    expected = np.array([[1e10, 2e10], [2e10, 3e10]])

def test_input_with_floats():
    # Test input with floating point values
    unique = np.array([[1.1], [2.2], [3.3]])
    codeflash_output = unique_to_symmetric(unique, symbolic=False); result = codeflash_output # 52.5μs -> 44.1μs (19.2% faster)
    expected = np.array([[1.1, 2.2], [2.2, 3.3]])
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-unique_to_symmetric-mct239i5 and push.

Codeflash

Here’s an optimized version of your code, focused on replacing the very slow Python for-loop and double-loop assignment with vectorized NumPy indexing, minimizing creation of lists, and refactoring the symbolic-path list-comprehensions. Comments are preserved unless code was modified.



**Optimizations:**
- **Vectorized symmetric assignment:** Eliminated slow Python loop, using NumPy advanced indexing to fill both lower and upper triangle in one shot.
- **Single list comprehension for symbolic:** The expensive Python list-building (.tolist()) is only used for the symbolic path, and is made as flat as possible.
- **Removed unnecessary conversion:** .tolist() is not needed unless used for casadi.hcat, kept only for that section.

**Result:**  
This should drastically improve speed, as most of the time was in the explicit for-loop and per-element assignment, now replaced by a few fast-indexed vectorized assignments.

**Note:**  
If `unique` can ever have more than 2 dimensions, adjust the indexing as needed (but in common use, the above should work). If symbolic matrices are always scalar, this is fully correct. For batch symbolic, adjustments may be needed.
@misrasaurabh1
Copy link

Hi @ixjlyons , we recently tried to optimize condor with codeflash, and found this good optimization. The details on how we tested the change is attached in the description where you can also see how the runtime has improved for each test case.
Let us know if the change looks good, otherwise please let us know if we can improve it someway.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants