TEMPO: Threshold-Enabled Multipath Parallel Output

A novel approach to language model text generation that processes multiple token candidates simultaneously by modifying Rotary Position Embeddings (RoPE).

Key Innovation

TEMPO introduces a new generation paradigm that differs fundamentally from beam search:

Parallel Token Processing: Multiple tokens at the same logical position are processed within a single forward pass
RoPE Modification: Custom positional embeddings enable tokens to share positions while maintaining distinct identities
Attention-Based Pruning: Uses attention patterns from future tokens to retroactively prune less coherent paths

Algorithm Overview

1. Parallel Token Selection

Instead of sampling one token per position, TEMPO selects all tokens above a probability threshold:

# Traditional: sample one token
next_token = sample(logits)

# TEMPO: select multiple tokens above threshold
parallel_tokens = [t for t, p in enumerate(probs) if p > selection_threshold]

2. Modified Positional Embeddings

The core innovation modifies RoPE to assign the same positional encoding to parallel tokens:

# Map multiple physical positions to same logical position
logical_position = position_map[physical_position]
# Apply RoPE with logical position instead of physical

3. Retroactive Attention Pruning

Analyzes how future tokens attend to past parallel options:

Tokens receiving low attention from future tokens are pruned
Maintains coherence while exploring multiple paths
Dynamic threshold adjustment using Bezier curves

Technical Implementation

Core Components

RoPE Modifications (src/algorithms/rope/)

position_mapper.py: Maps physical to logical positions
embedding_modifier.py: Core RoPE modification functions
model_patcher.py: Runtime model patching utilities

Attention Analysis (src/algorithms/attention/)

mask_builder.py: Constructs masks for parallel token isolation
pattern_analyzer.py: Analyzes attention for pruning decisions
weight_extractor.py: Extracts attention weights from models

Generation Pipeline (src/algorithms/generation/)

logits_processor.py: Processes logits and applies thresholds
kv_cache_manager.py: Manages KV caches efficiently
parallel_processor.py: Handles parallel token batching

Pruning Algorithms (src/algorithms/pruning/)

attention_pruner.py: Prunes based on attention patterns
threshold_manager.py: Dynamic threshold adjustment
multi_scale_pruner.py: Multi-scale attention analysis

Advanced Features

Monte Carlo Tree Search Integration: Explores generation paths systematically
Dynamic Thresholding: Bezier curve-based threshold adjustment
Multi-Scale Attention: Aggregates attention patterns across layers

Setup

# Clone repository
git clone https://github.com/JoeLuker/tempo.git && cd tempo

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Run generation
python3 run_tempo.py --prompt "Your prompt" --selection-threshold 0.1

Example Usage

# Basic parallel generation
python3 run_tempo.py --prompt "The future of AI is" --selection-threshold 0.1

# With retroactive pruning
python3 run_tempo.py --prompt "Explain quantum computing" \
    --selection-threshold 0.1 \
    --use-retroactive-pruning \
    --attention-threshold 0.01

# With MCTS exploration
python3 run_tempo.py --prompt "Write a story" \
    --selection-threshold 0.15 \
    --use-mcts \
    --mcts-simulations 100

Research Contributions

Novel Generation Paradigm: First approach to modify positional embeddings for parallel token processing
Attention-Based Coherence: Uses model's own attention as pruning signal
Efficient Implementation: Maintains single model state for multiple paths

Performance Characteristics

Diversity: 2-3x more diverse outputs compared to beam search
Coherence: Attention-based pruning maintains quality
Efficiency: Batch processing minimizes overhead

Citation

If you use TEMPO in your research, please cite:

@software{tempo2024,
  title={TEMPO: Threshold-Enabled Multipath Parallel Output},
  author={Luker, Joe},
  year={2024},
  url={https://github.com/JoeLuker/tempo}
}

License

MIT License - See LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 132 Commits
.github/workflows		.github/workflows
benchmark		benchmark
docs		docs
examples		examples
experiments		experiments
frontend		frontend
playground		playground
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
build_package.sh		build_package.sh
requirements.txt		requirements.txt
run_tempo.py		run_tempo.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TEMPO: Threshold-Enabled Multipath Parallel Output

Key Innovation

Algorithm Overview

1. Parallel Token Selection

2. Modified Positional Embeddings

3. Retroactive Attention Pruning

Technical Implementation

Core Components

Advanced Features

Setup

Example Usage

Research Contributions

Performance Characteristics

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

JoeLuker/tempo

Folders and files

Latest commit

History

Repository files navigation

TEMPO: Threshold-Enabled Multipath Parallel Output

Key Innovation

Algorithm Overview

1. Parallel Token Selection

2. Modified Positional Embeddings

3. Retroactive Attention Pruning

Technical Implementation

Core Components

Advanced Features

Setup

Example Usage

Research Contributions

Performance Characteristics

Citation

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages