Skip to content

abodthedude25/polyopt

Repository files navigation

PolyOpt - Polyhedral Loop Optimizer

A research-grade polyhedral compiler for optimizing loop nests

FeaturesInstallationQuick StartExamplesCommandsTheory


Overview

PolyOpt is a polyhedral compiler that automatically optimizes loop nests for:

  • Parallelism: Identify and exploit parallel loops
  • Data Locality: Improve cache utilization through tiling
  • Vectorization: Enable SIMD optimizations

It takes programs written in a simple .poly language and generates optimized C code with OpenMP annotations.

┌─────────────┐     ┌──────────────┐     ┌────────────────┐     ┌─────────────┐
│  .poly file │ ──► │  Parse & IR  │ ──► │  Dependence    │ ──► │  Transform  │
│             │     │  Lowering    │     │  Analysis      │     │  & Schedule │
└─────────────┘     └──────────────┘     └────────────────┘     └──────┬──────┘
                                                                       │
                                                                       ▼
                                                               ┌─────────────┐
                                                               │  Generate   │
                                                               │  C/OpenMP   │
                                                               └─────────────┘

Features

  • Frontend: Custom parser for a simple loop-focused language
  • Analysis:
    • Dependence analysis (GCD test, Banerjee test)
    • Parallelism detection
    • Loop-carried vs loop-independent dependencies
  • Transformations:
    • Loop tiling (blocking) for cache optimization
    • Loop interchange for better memory access
    • Loop fusion and distribution
    • Automatic scheduling (Pluto, Feautrier, Greedy algorithms)
  • Code Generation:
    • Clean C output
    • OpenMP parallelization pragmas
    • Vectorization hints
    • Benchmark code generation
  • Advanced Features (optional):
    • ISL Integration: Exact polyhedral math via Integer Set Library
    • Auto-Tuning: Automatic parameter optimization

Advanced Features

ISL Integration

PolyOpt can use the Integer Set Library (ISL) for exact polyhedral computations. ISL is the same library used by LLVM's Polly and GCC's Graphite.

Installation:

# macOS
brew install isl

# Ubuntu/Debian  
apt install libisl-dev isl-utils

# Fedora
dnf install isl-devel

Build with ISL support:

cargo build --release --features isl

Usage:

# Check ISL installation
polyopt isl input.poly --check

# Analyze domains in ISL format
polyopt isl input.poly --domain

# Compute optimal schedule
polyopt isl input.poly --schedule

# Show dependences in ISL format
polyopt isl input.poly --deps

Example output:

=== ISL Analysis: matmul ===

--- Domains (ISL format) ---
S0:
  { [i0,i1,i2] : 0 <= i0 < N and 0 <= i1 < M and 0 <= i2 < K }
  ISL canonical: [N, M, K] -> { [i0, i1, i2] : 0 <= i0 < N and 0 <= i1 < M and 0 <= i2 < K }

Auto-Tuning Framework

PolyOpt includes an auto-tuning framework that finds optimal transformation parameters by empirical search.

Build with auto-tuning:

cargo build --release --features autotuning

Usage:

# Basic auto-tuning (60 second budget)
polyopt autotune examples/matmul.poly -N 1000

# Quick mode (30 seconds)
polyopt autotune examples/matmul.poly --quick

# Custom search parameters
polyopt autotune examples/matmul.poly \
  --time 120 \
  --tiles "16,32,64,128" \
  --threads "1,2,4,8" \
  --search genetic

# Save results to CSV
polyopt autotune examples/matmul.poly -o results.csv

Search strategies:

  • grid: Exhaustive search (default, best for small spaces)
  • random: Random sampling (good for large spaces)
  • genetic: Genetic algorithm (good for complex optimization)

Example output:

=== Auto-Tuning: matmul.poly ===

Configuration:
  Max time: 60s
  Tile sizes: [16, 32, 64, 128]
  Thread counts: [1, 2, 4, 8]
  Search: Grid

Baseline: 0.4521s
[  1] t16_p1 -> 0.3892s (1.16x)
[  2] t32_p1 -> 0.3456s (1.31x)
[  3] t32_p4 -> 0.1234s (3.66x)
  *** New best: t32_p4 (3.66x speedup)
[  4] t64_p4 -> 0.0987s (4.58x)
  *** New best: t64_p4 (4.58x speedup)
...

=== Best Configuration Found ===
  Tile sizes: [64]
  Threads: 4
  Speedup: 4.58x

To generate optimized code:
  polyopt compile matmul.poly --openmp --tile 64

Installation

Prerequisites

  • Rust 1.70+ (install from rustup.rs)
  • GCC or Clang (for compiling generated code)

Build from Source

# Clone the repository
git clone https://github.com/yourusername/polyopt.git
cd polyopt

# Build in release mode
cargo build --release

# Run tests
cargo test

# Install (optional)
cargo install --path .

Verify Installation

# Check version
cargo run --bin polyopt -- --version

# See help
cargo run --bin polyopt -- --help

Quick Start

1. Your First Program

Create a file hello.poly:

// Vector addition: C[i] = A[i] + B[i]
func vector_add(A[N], B[N], C[N]) {
    for i = 0 to N {
        C[i] = A[i] + B[i];
    }
}

2. Compile to C

cargo run --bin polyopt -- compile hello.poly

Output:

// Generated by PolyOpt - Polyhedral Optimizer

#include <stdio.h>
#include <stdlib.h>
...

void vector_add(int N, double* A, double* B, double* C) {
    // Statement S0: S0
    for (int i = 0; i < N; i++) {
        C[i] = (A[i] + B[i]);
    }
}

3. Add OpenMP Parallelization

cargo run --bin polyopt -- compile hello.poly --openmp

4. Analyze Dependencies

cargo run --bin polyopt -- analyze hello.poly --deps --parallel

5. Visualize Iteration Space

cargo run --bin polyvis -- hello.poly --params N=8

The .poly Language

Basic Syntax

// Comments use // or /* */

func function_name(ArrayA[N][M], ArrayB[K], scalar_param) {
    // For loops
    for i = start to end {
        // Statements
        ArrayA[i][j] = expression;
    }
    
    // For loops with step
    for i = 0 to N step 2 {
        ...
    }
}

Array Declarations

// 1D array with symbolic size N
A[N]

// 2D array with sizes N x M  
B[N][M]

// 3D array
C[N][M][K]

Expressions

// Arithmetic
A[i] + B[j]
A[i] * 2.5
A[i] / B[j]
A[i] - B[j]

// Array access with affine indices
A[i]
B[i][j]
C[i+1][j-1]
C[2*i][j]

// Constants
0
1.5
3.14159

Loop Bounds

// Simple bounds
for i = 0 to N { ... }

// Offset bounds
for i = 1 to N-1 { ... }

// Triangular bounds (planned)
for j = 0 to i { ... }

Commands

polyopt compile - Generate Code

Compile a .poly file to C code.

# Basic compilation
polyopt compile input.poly

# Output to file
polyopt compile input.poly -o output.c

# With OpenMP
polyopt compile input.poly --openmp

# With vectorization hints
polyopt compile input.poly --openmp --vectorize

# Generate benchmark wrapper
polyopt compile input.poly --benchmark

Options:

Option Description
-o, --output Output file (default: stdout)
-t, --target Target: c, openmp, cuda, opencl
--openmp Enable OpenMP parallelization
--vectorize Add SIMD pragmas
--benchmark Generate timing code

polyopt analyze - Dependence Analysis

Analyze dependencies and parallelism opportunities.

# Basic analysis
polyopt analyze input.poly

# Show dependencies
polyopt analyze input.poly --deps

# Show parallelism
polyopt analyze input.poly --parallel

# Full analysis
polyopt analyze input.poly --deps --parallel --stats

# JSON output
polyopt analyze input.poly --json

Example Output:

=== Analysis: matmul ===

Statements: 2
Parameters: ["N", "K", "M"]
Arrays: ["A", "C", "B"]

--- Dependences ---
Total: 6
  Flow (RAW): 2
  Anti (WAR): 2
  Output (WAW): 2
  Loop-carried: 6
  Loop-independent: 0

--- Parallelism ---
  Level 0: ✗ sequential
  Level 1: ✗ sequential
  Level 2: ✗ sequential

No directly parallelizable loops found
Consider applying transformations (tiling, skewing)

polyopt optimize - Apply Transformations

Apply optimizations and generate code.

# Auto-optimize with Pluto scheduler
polyopt optimize input.poly --schedule pluto

# Apply tiling
polyopt optimize input.poly --tile 32

# Combined optimizations
polyopt optimize input.poly --schedule pluto --tile 32 --openmp

# Output to file
polyopt optimize input.poly --tile 32 -o optimized.c

Options:

Option Description
--tile SIZE Apply tiling with given block size
--schedule ALG Scheduling: pluto, feautrier, greedy
--openmp Enable OpenMP in output
-o, --output Output file

polyopt parse - Debug Parser

View the parsed representation of a program.

# Show AST
polyopt parse input.poly

# Show HIR (High-level IR)
polyopt parse input.poly --hir

# Show PIR (Polyhedral IR)
polyopt parse input.poly --pir

polyopt bench - Run Benchmarks

Compile and run benchmarks.

# Run with default size (N=1000)
polyopt bench input.poly

# Specify size
polyopt bench input.poly -N 2000

# Multiple iterations
polyopt bench input.poly -N 1000 --iterations 10

# With OpenMP
polyopt bench input.poly -N 1000 --openmp

polyvis - Visualize Iteration Spaces

Text-based visualization of iteration spaces and dependencies.

# Basic visualization
polyvis input.poly

# With parameter values
polyvis input.poly --params N=10,M=10

# Limit displayed iterations
polyvis input.poly --params N=20 --max-iters 100

# Verbose mode (show dependency details)
polyvis input.poly --params N=8 --verbose

Example Output:

╔══════════════════════════════════════════════════════════════════╗
║                    PolyOpt Iteration Space Visualizer            ║
╚══════════════════════════════════════════════════════════════════╝

Program: matmul
Parameters: N=8, K=8, M=8

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Statement S0 (2D domain):
  Iteration space (8 x 8 = 64 points):

    j→  0 1 2 3 4 5 6 7
  i↓
   0    ● ● ● ● ● ● ● ●
   1    ● ● ● ● ● ● ● ●
   2    ● ● ● ● ● ● ● ●
   ...

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Dependence Summary:
  Flow (RAW):   2
  Anti (WAR):   2
  Output (WAW): 2

Parallelism: No parallel loops detected at any level

Examples

Basic Examples

Vector Addition (Parallel)

# File: examples/basic/vector_add.poly
polyopt analyze examples/basic/vector_add.poly --parallel
# Output: Level 0: ✓ PARALLEL

Array Copy (2D Parallel)

# File: examples/basic/array_copy.poly  
polyopt compile examples/basic/array_copy.poly --openmp

Linear Algebra

Matrix Multiplication

# Analyze dependencies
polyopt analyze examples/linalg/matmul.poly --deps

# Generate tiled code
polyopt optimize examples/linalg/matmul.poly --tile 32 --openmp -o matmul_tiled.c

# Benchmark
polyopt bench examples/linalg/matmul.poly -N 512 --openmp

Matrix-Vector Multiply

# File: examples/linalg/matvec.poly
polyopt compile examples/linalg/matvec.poly --openmp

Matrix Transpose

# File: examples/linalg/transpose.poly
# Both loops parallelizable
polyopt analyze examples/linalg/transpose.poly --parallel

Stencil Computations

Jacobi 2D (Double-Buffered - Parallel)

# File: examples/stencils/jacobi_2d.poly
# Reads from A, writes to B - no loop-carried dependencies
polyopt analyze examples/stencils/jacobi_2d.poly --parallel
polyopt compile examples/stencils/jacobi_2d.poly --openmp

Gauss-Seidel 2D (In-Place - Sequential)

# File: examples/stencils/gauss_seidel_2d.poly
# In-place update creates dependencies
polyopt analyze examples/stencils/gauss_seidel_2d.poly --deps
# Shows loop-carried dependencies - needs wavefront parallelization

5-Point Stencil

# File: examples/stencils/stencil_2d_5pt.poly
polyvis examples/stencils/stencil_2d_5pt.poly --params N=10

Reductions

Vector Sum

# File: examples/reductions/vector_sum.poly
polyopt compile examples/reductions/vector_sum.poly --openmp
# Generates: #pragma omp parallel for reduction(+:result)

Dot Product

# File: examples/reductions/dot_product.poly
polyopt compile examples/reductions/dot_product.poly --openmp

Transformation Examples

Loop Interchange

# File: examples/transformations/interchange_example.poly
# Original has poor cache behavior (column-major access)
polyopt analyze examples/transformations/interchange_example.poly --deps
polyopt optimize examples/transformations/interchange_example.poly --schedule pluto

Loop Tiling

# File: examples/transformations/tiling_example.poly
# Matrix multiply benefits from tiling
polyopt optimize examples/transformations/tiling_example.poly --tile 32 -o tiled.c

Loop Fusion

# File: examples/transformations/fusion_example.poly
# Multiple loops over same iteration space
polyopt analyze examples/transformations/fusion_example.poly --deps

Interactive Workflow Tutorial

Step 1: Write Your Kernel

Create mykernel.poly:

func saxpy(X[N], Y[N]) {
    for i = 0 to N {
        Y[i] = 2.0 * X[i] + Y[i];
    }
}

Step 2: Analyze Dependencies

$ polyopt analyze mykernel.poly --deps --parallel

=== Analysis: saxpy ===

Statements: 1
Parameters: ["N"]
Arrays: ["X", "Y"]

--- Dependences ---
Total: 0
  Flow (RAW): 0
  Anti (WAR): 0
  Output (WAW): 0

--- Parallelism ---
  Level 0: ✓ PARALLEL

Recommendation: Parallelize at level 0

Step 3: Visualize

$ polyvis mykernel.poly --params N=10

Statement S0 (1D domain):
  Iteration space (10 points):
    i→  0 1 2 3 4 5 6 7 8 9
        ● ● ● ● ● ● ● ● ● ●

Parallelism: Parallel at level 0

Step 4: Generate Parallel Code

$ polyopt compile mykernel.poly --openmp

void saxpy(int N, double* restrict X, double* restrict Y) {
    // Statement S0: S0
    #pragma omp parallel for
    for (int i = 0; i < N; i++) {
        Y[i] = ((2.0 * X[i]) + Y[i]);
    }
}

Step 5: Benchmark

$ polyopt bench mykernel.poly -N 10000000 --iterations 5 --openmp

=== Benchmark: saxpy ===
Size: N = 10000000
Iterations: 5
OpenMP: enabled

Running 5 iterations...

  Run 1: Time: 0.0234 seconds
  Run 2: Time: 0.0231 seconds
  Run 3: Time: 0.0229 seconds
  Run 4: Time: 0.0232 seconds
  Run 5: Time: 0.0230 seconds

The Polyhedral Model

What is the Polyhedral Model?

The polyhedral model represents loop nests as geometric objects:

  • Iteration Space: The set of all loop iterations, represented as integer points in a polyhedron
  • Schedule: A mapping that determines execution order
  • Access Functions: Affine functions describing array accesses

Why Polyhedral?

Traditional Compiler Polyhedral Compiler
Pattern matching Mathematical reasoning
Limited scope Global optimization
Heuristic Exact dependence analysis

Example: Matrix Multiply

Original code:

for (i = 0; i < N; i++)
  for (j = 0; j < N; j++)
    for (k = 0; k < N; k++)
      C[i][j] += A[i][k] * B[k][j];

Polyhedral representation:

  • Domain: { S[i,j,k] : 0 ≤ i,j,k < N }
  • Schedule: { S[i,j,k] → [i,j,k] } (original order)
  • Read accesses: { S[i,j,k] → A[i,k] }, { S[i,j,k] → B[k,j] }, { S[i,j,k] → C[i,j] }
  • Write access: { S[i,j,k] → C[i,j] }

Dependence Types

Type Also Called Pattern Example
Flow RAW (Read After Write) Write → Read A[i] = ...; ... = A[i];
Anti WAR (Write After Read) Read → Write ... = A[i]; A[i] = ...;
Output WAW (Write After Write) Write → Write A[i] = ...; A[i] = ...;

Transformations

Transformation Effect When to Use
Tiling Block loops for cache Large working sets
Interchange Swap loop order Improve memory access pattern
Fusion Merge loops Improve temporal locality
Distribution Split loops Enable parallelization
Skewing Add loop-carried values Enable wavefront parallelism

Complete Example Listing

examples/
├── basic/
│   ├── vector_add.poly      # Simple parallel loop
│   ├── vector_scale.poly    # Scalar multiplication
│   ├── array_copy.poly      # 2D parallel copy
│   └── array_init.poly      # Index-dependent init
│
├── linalg/
│   ├── matmul.poly          # Matrix multiplication
│   ├── matvec.poly          # Matrix-vector multiply
│   ├── transpose.poly       # Matrix transpose
│   ├── trisolve.poly        # Triangular solve
│   ├── symm.poly            # Symmetric matrix multiply
│   ├── atax.poly            # A^T * A * x kernel
│   └── bicg.poly            # Biconjugate gradient
│
├── stencils/
│   ├── stencil_1d.poly      # 1D 3-point stencil
│   ├── stencil_2d_5pt.poly  # 2D 5-point stencil
│   ├── stencil_2d_9pt.poly  # 2D 9-point stencil
│   ├── jacobi_2d.poly       # Jacobi iteration (parallel)
│   ├── gauss_seidel_2d.poly # Gauss-Seidel (sequential)
│   └── conv2d.poly          # 2D convolution
│
├── reductions/
│   ├── vector_sum.poly      # Sum reduction
│   ├── dot_product.poly     # Dot product
│   ├── frobenius_norm.poly  # Matrix norm
│   └── prefix_sum.poly      # Prefix sum (sequential)
│
├── parallel/
│   ├── embarrassingly_parallel.poly  # Perfect parallelism
│   └── rowsum.poly          # Parallel outer, reduction inner
│
└── transformations/
    ├── interchange_example.poly   # Loop interchange demo
    ├── tiling_example.poly        # Tiling demo
    ├── fusion_example.poly        # Loop fusion demo
    └── distribution_example.poly  # Loop distribution demo

Tips & Best Practices

Writing Analyzable Code

  1. Use affine bounds: 0 to N, 1 to N-1, not 0 to f(x)
  2. Use affine indices: A[i][j], A[i+1][j-1], not A[B[i]]
  3. Separate read/write arrays when possible (enables more parallelism)
  4. Use compound assignment for reductions: sum = sum + x

Optimization Strategy

  1. Analyze first: Understand dependencies before optimizing
  2. Check parallelism: Use --parallel to find parallel loops
  3. Consider tiling: Large iteration spaces benefit from tiling
  4. Benchmark: Always verify optimizations improve performance

Common Patterns

Pattern Parallelizable? Optimization
Element-wise ops ✓ Yes OpenMP parallel for
Stencil (double-buffer) ✓ Yes Tiling + OpenMP
Stencil (in-place) ✗ No Wavefront/skewing
Reduction Partial OpenMP reduction clause
Sequential scan ✗ No Algorithm change needed

Troubleshooting

"Unknown variable in affine expression"

The variable wasn't declared as a loop iterator or parameter.

// Wrong - T not defined
for i = 0 to T { ... }

// Right - T is a parameter
func foo(A[T]) {
    for i = 0 to T { ... }
}

"No parallel loops found"

Check for loop-carried dependencies:

polyopt analyze input.poly --deps

Common causes:

  • In-place array updates: A[i] = f(A[i-1])
  • Reduction without proper pattern
  • Complex index expressions

Generated code doesn't compile

Ensure indices match array dimensions:

// Wrong - A is 1D but accessed as 2D
func foo(A[N]) {
    for i = 0 to N {
        A[i][j] = 0;  // Error!
    }
}

Advanced Features

ISL Integration (Exact Polyhedral Math)

PolyOpt includes a polyhedral math module with both a pure Rust simulation and optional native ISL support.

Usage

# Parse and analyze an ISL set expression directly
polyopt isl "{ [i, j] : 0 <= i < 10 and 0 <= j < 10 }" --expr --enumerate

# Analyze a .poly file's domains
polyopt isl examples/basic/array_copy.poly --domain

# Generate tiled loop code
polyopt isl "{ [i, j] : 0 <= i < N and 0 <= j < N }" --expr --tile 32

# Show scheduling info
polyopt isl examples/linalg/matmul.poly --schedule

Example Output

=== ISL Polyhedral Analysis ===

Using: Pure Rust polyhedral simulation
       (Install ISL for exact operations: brew install isl)

Set: { [i, j] : i >= 0 and -i + 9 >= 0 and j >= 0 and -j + 9 >= 0 }
Variables: ["i", "j"]
Constraints: 4

--- Point Enumeration ---
    0: ([0, 0])
    1: ([0, 1])
    ...
Total points: 100

Installing Native ISL (Optional)

For exact polyhedral operations, install ISL:

# macOS
brew install isl

# Ubuntu/Debian
sudo apt install libisl-dev isl-utils

# Fedora
sudo dnf install isl-devel

Auto-Tuning Framework

The auto-tuner automatically searches for optimal transformation parameters by compiling and benchmarking multiple variants.

Usage

# Basic auto-tuning with OpenMP
polyopt autotune input.poly --openmp

# Specify tile sizes to try
polyopt autotune input.poly --tiles "8,16,32,64,128" --openmp

# Use different search strategies
polyopt autotune input.poly --strategy exhaustive  # Try all combinations
polyopt autotune input.poly --strategy random      # Random sampling
polyopt autotune input.poly --strategy genetic     # Genetic algorithm
polyopt autotune input.poly --strategy annealing   # Simulated annealing

# Customize benchmark parameters
polyopt autotune input.poly -N 1000 --iterations 5 --openmp

# Save results to file
polyopt autotune input.poly -o results.csv --openmp

Search Strategies

Strategy Description Use When
exhaustive Tests all combinations Small search space, need optimal
random Random sampling (20 samples) Quick exploration
genetic Evolutionary optimization Large search space
annealing Simulated annealing Avoid local minima

Contributing

Contributions welcome! Areas of interest:

  • Additional scheduling algorithms
  • GPU code generation (CUDA/OpenCL)
  • More transformation passes
  • Performance benchmarking

License

MIT License - see LICENSE file


Acknowledgments

PolyOpt is inspired by:

  • ISL - Integer Set Library
  • Pluto - Automatic parallelizer
  • Polly - LLVM polyhedral optimizer
  • PolyBench - Polyhedral benchmark suite

Made with ❤️ for the HPC community

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published