TreeSA

Simulated annealing optimizer for high-quality tensor contraction orders.

How It Works

TreeSA uses simulated annealing to explore the space of contraction trees:

Start with a random or greedy tree
Repeatedly apply random mutations (subtree rotations, swaps)
Accept better mutations, sometimes accept worse ones (to escape local optima)
Gradually decrease “temperature” to converge

Result: Often finds significantly better orders than greedy.

Basic Usage

Fast Preset (Recommended)

from omeco import optimize_code, TreeSA

# Quick optimization with sensible defaults
tree = optimize_code(ixs, out, sizes, TreeSA.fast())

Custom Configuration

from omeco import optimize_code, TreeSA, ScoreFunction

# More thorough search
score = ScoreFunction(sc_target=25.0)
optimizer = TreeSA(
    ntrials=10,      # Number of independent runs
    niters=50,       # Iterations per run
    score=score      # Custom scoring function
)
tree = optimize_code(ixs, out, sizes, optimizer)

Rust

#![allow(unused)]
fn main() {
use omeco::{TreeSA, ScoreFunction, optimize_code};

let score = ScoreFunction::default();
let optimizer = TreeSA::fast(score);
let tree = optimize_code(&code, &sizes, &optimizer)?;
}

Parameters

`ntrials` (default: 10)

Number of independent optimization runs. The best result is returned.

Higher = better quality (more exploration)
Lower = faster
Typical range: 5-20

`niters` (default: 50)

Number of iterations per trial.

Higher = better convergence
Lower = faster
Typical range: 20-100

`betas` (optional)

Temperature schedule for simulated annealing.

Default: logarithmic schedule from 1 to 40
Custom: [β₁, β₂, ..., βₙ] where β = 1/temperature
Higher β = less randomness (exploitation)
Lower β = more randomness (exploration)

`score` (optional)

Custom ScoreFunction for hardware-specific optimization.

# GPU optimization (see GPU Optimization Guide)
score = ScoreFunction(
    tc_weight=1.0,
    sc_weight=1.0,
    rw_weight=10.0,
    sc_target=30.0
)
optimizer = TreeSA(ntrials=5, niters=50, score=score)

Performance Characteristics

Advantages:

🏆 Higher quality solutions than greedy
🔍 Explores global search space
⚙️ Configurable time-quality trade-off
🎯 Often finds near-optimal solutions

Limitations:

⏱️ Slower than greedy (minutes vs seconds)
🎲 Non-deterministic (different runs give different results)
🔧 Requires parameter tuning for best results

Example Comparison

from omeco import optimize_code, GreedyMethod, TreeSA

# Same problem, different optimizers
ixs = [[0, 1, 2], [2, 3, 4], [4, 5, 6], [6, 7, 0]]  # Cycle
out = [1, 3, 5, 7]
sizes = {i: 2 for i in range(8)}

# Greedy
tree_greedy = optimize_code(ixs, out, sizes, GreedyMethod())
comp_greedy = tree_greedy.complexity(ixs, sizes)

# TreeSA
tree_sa = optimize_code(ixs, out, sizes, TreeSA.fast())
comp_sa = tree_sa.complexity(ixs, sizes)

print(f"Greedy - tc: 2^{comp_greedy.tc:.2f}, sc: 2^{comp_greedy.sc:.2f}")
print(f"TreeSA - tc: 2^{comp_sa.tc:.2f}, sc: 2^{comp_sa.sc:.2f}")

# Speedup
speedup = 2 ** (comp_greedy.tc - comp_sa.tc)
print(f"TreeSA is {speedup:.1f}x faster")

Typical output:

Greedy - tc: 2^16.00, sc: 2^12.00
TreeSA - tc: 2^14.58, sc: 2^11.00
TreeSA is 2.7x faster

When to Use

✅ Use TreeSA when:

Greedy result is too slow/uses too much memory
Network is complex (cycles, irregular graphs)
Result will be used many times (worth optimizing once)
You have minutes to spare for optimization

❌ Stick with GreedyMethod when:

Need results in seconds
Network is simple (chains, small grids)
Iterating/prototyping
Greedy is already good enough

Optimization Workflow

# 1. Quick baseline with greedy
tree_greedy = optimize_code(ixs, out, sizes, GreedyMethod())
comp_greedy = tree_greedy.complexity(ixs, sizes)
print(f"Greedy: tc={comp_greedy.tc:.2f}, sc={comp_greedy.sc:.2f}")

# 2. If not satisfactory, try TreeSA
if comp_greedy.sc > 28.0:  # Too much memory
    score = ScoreFunction(sc_target=28.0, sc_weight=2.0)
    tree_sa = optimize_code(ixs, out, sizes, TreeSA(ntrials=10, score=score))
    comp_sa = tree_sa.complexity(ixs, sizes)
    print(f"TreeSA: tc={comp_sa.tc:.2f}, sc={comp_sa.sc:.2f}")

Tuning Parameters

For Speed

# Fast TreeSA (2-3x slower than greedy)
TreeSA(ntrials=2, niters=20)

For Quality

# Thorough TreeSA (10-20x slower than greedy)
TreeSA(ntrials=20, niters=100)

For Memory-Constrained

# Prioritize space over time
score = ScoreFunction(tc_weight=0.1, sc_weight=10.0, sc_target=25.0)
TreeSA(ntrials=10, niters=50, score=score)

Advanced: Custom Temperature Schedule

# Custom annealing schedule
import numpy as np

# Slow cooling for thorough search
betas = np.logspace(0, 2, 50)  # 1 to 100

optimizer = TreeSA(ntrials=5, betas=betas.tolist())

Next Steps

Score Function Configuration - Optimize for your hardware
Algorithm Comparison - Detailed benchmarks
Slicing Strategy - Reduce memory further

Keyboard shortcuts

omeco