Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

TreeSA

Simulated annealing optimizer for high-quality tensor contraction orders.

How It Works

TreeSA uses simulated annealing to explore the space of contraction trees:

  1. Start with a random or greedy tree
  2. Repeatedly apply random mutations (subtree rotations, swaps)
  3. Accept better mutations, sometimes accept worse ones (to escape local optima)
  4. Gradually decrease “temperature” to converge

Result: Often finds significantly better orders than greedy.

Basic Usage

from omeco import optimize_code, TreeSA

# Quick optimization with sensible defaults
tree = optimize_code(ixs, out, sizes, TreeSA.fast())

Custom Configuration

from omeco import optimize_code, TreeSA, ScoreFunction

# More thorough search
score = ScoreFunction(sc_target=25.0)
optimizer = TreeSA(
    ntrials=10,      # Number of independent runs
    niters=50,       # Iterations per run
    score=score      # Custom scoring function
)
tree = optimize_code(ixs, out, sizes, optimizer)

Rust

#![allow(unused)]
fn main() {
use omeco::{TreeSA, ScoreFunction, optimize_code};

let score = ScoreFunction::default();
let optimizer = TreeSA::fast(score);
let tree = optimize_code(&code, &sizes, &optimizer)?;
}

Parameters

ntrials (default: 10)

Number of independent optimization runs. The best result is returned.

  • Higher = better quality (more exploration)
  • Lower = faster
  • Typical range: 5-20

niters (default: 50)

Number of iterations per trial.

  • Higher = better convergence
  • Lower = faster
  • Typical range: 20-100

betas (optional)

Temperature schedule for simulated annealing.

  • Default: logarithmic schedule from 1 to 40
  • Custom: [β₁, β₂, ..., βₙ] where β = 1/temperature
  • Higher β = less randomness (exploitation)
  • Lower β = more randomness (exploration)

score (optional)

Custom ScoreFunction for hardware-specific optimization.

# GPU optimization (see GPU Optimization Guide)
score = ScoreFunction(
    tc_weight=1.0,
    sc_weight=1.0,
    rw_weight=10.0,
    sc_target=30.0
)
optimizer = TreeSA(ntrials=5, niters=50, score=score)

Performance Characteristics

Advantages:

  • 🏆 Higher quality solutions than greedy
  • 🔍 Explores global search space
  • ⚙️ Configurable time-quality trade-off
  • 🎯 Often finds near-optimal solutions

Limitations:

  • ⏱️ Slower than greedy (minutes vs seconds)
  • 🎲 Non-deterministic (different runs give different results)
  • 🔧 Requires parameter tuning for best results

Example Comparison

from omeco import optimize_code, GreedyMethod, TreeSA

# Same problem, different optimizers
ixs = [[0, 1, 2], [2, 3, 4], [4, 5, 6], [6, 7, 0]]  # Cycle
out = [1, 3, 5, 7]
sizes = {i: 2 for i in range(8)}

# Greedy
tree_greedy = optimize_code(ixs, out, sizes, GreedyMethod())
comp_greedy = tree_greedy.complexity(ixs, sizes)

# TreeSA
tree_sa = optimize_code(ixs, out, sizes, TreeSA.fast())
comp_sa = tree_sa.complexity(ixs, sizes)

print(f"Greedy - tc: 2^{comp_greedy.tc:.2f}, sc: 2^{comp_greedy.sc:.2f}")
print(f"TreeSA - tc: 2^{comp_sa.tc:.2f}, sc: 2^{comp_sa.sc:.2f}")

# Speedup
speedup = 2 ** (comp_greedy.tc - comp_sa.tc)
print(f"TreeSA is {speedup:.1f}x faster")

Typical output:

Greedy - tc: 2^16.00, sc: 2^12.00
TreeSA - tc: 2^14.58, sc: 2^11.00
TreeSA is 2.7x faster

When to Use

Use TreeSA when:

  • Greedy result is too slow/uses too much memory
  • Network is complex (cycles, irregular graphs)
  • Result will be used many times (worth optimizing once)
  • You have minutes to spare for optimization

Stick with GreedyMethod when:

  • Need results in seconds
  • Network is simple (chains, small grids)
  • Iterating/prototyping
  • Greedy is already good enough

Optimization Workflow

# 1. Quick baseline with greedy
tree_greedy = optimize_code(ixs, out, sizes, GreedyMethod())
comp_greedy = tree_greedy.complexity(ixs, sizes)
print(f"Greedy: tc={comp_greedy.tc:.2f}, sc={comp_greedy.sc:.2f}")

# 2. If not satisfactory, try TreeSA
if comp_greedy.sc > 28.0:  # Too much memory
    score = ScoreFunction(sc_target=28.0, sc_weight=2.0)
    tree_sa = optimize_code(ixs, out, sizes, TreeSA(ntrials=10, score=score))
    comp_sa = tree_sa.complexity(ixs, sizes)
    print(f"TreeSA: tc={comp_sa.tc:.2f}, sc={comp_sa.sc:.2f}")

Tuning Parameters

For Speed

# Fast TreeSA (2-3x slower than greedy)
TreeSA(ntrials=2, niters=20)

For Quality

# Thorough TreeSA (10-20x slower than greedy)
TreeSA(ntrials=20, niters=100)

For Memory-Constrained

# Prioritize space over time
score = ScoreFunction(tc_weight=0.1, sc_weight=10.0, sc_target=25.0)
TreeSA(ntrials=10, niters=50, score=score)

Advanced: Custom Temperature Schedule

# Custom annealing schedule
import numpy as np

# Slow cooling for thorough search
betas = np.logspace(0, 2, 50)  # 1 to 100

optimizer = TreeSA(ntrials=5, betas=betas.tolist())

Next Steps