Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Score Function Configuration

Configure omeco’s optimization for different hardware and use cases.

What is the Score Function?

The ScoreFunction controls how the optimizer balances three objectives:

  • tc (time complexity): Total FLOPs
  • sc (space complexity): Peak memory usage
  • rwc (read-write complexity): Memory I/O operations

Formula:

score = tc_weight × 2^tc + rw_weight × 2^rwc + sc_weight × max(0, 2^sc - 2^sc_target)

Lower score is better. The optimizer tries to minimize this score.

Basic Usage

Default (Balanced)

from omeco import ScoreFunction

# Balanced optimization (works for most cases)
score = ScoreFunction()
# Equivalent to:
# ScoreFunction(tc_weight=1.0, sc_weight=1.0, rw_weight=0.0, sc_target=20.0)

Custom Weights

# Prioritize memory over time
score = ScoreFunction(
    tc_weight=1.0,      # Time matters
    sc_weight=2.0,      # Memory matters 2x more
    rw_weight=0.0,      # Ignore I/O (CPU)
    sc_target=25.0      # Target 2^25 elements (~256MB for float64)
)

Using with Optimizers

from omeco import TreeSA, optimize_code

score = ScoreFunction(sc_target=28.0)
tree = optimize_code(ixs, out, sizes, TreeSA(score=score))

Hardware-Specific Configuration

CPU Optimization

Characteristics:

  • Memory bandwidth is not the bottleneck
  • Balance time and space
  • Ignore read-write complexity
score = ScoreFunction(
    tc_weight=1.0,
    sc_weight=1.0,
    rw_weight=0.0,      # ← CPU: I/O is not a bottleneck
    sc_target=28.0      # ~256MB (2^28 × 8 bytes for float64)
)

Calculate sc_target:

import math

# For 16GB RAM, reserve half for tensors
available_gb = 8
bytes_per_element = 8  # float64
sc_target = math.log2(available_gb * 1024**3 / bytes_per_element)
# sc_target ≈ 30.0 (8GB)

GPU Optimization

Note: GPU memory bandwidth is often the bottleneck, not compute.

score = ScoreFunction(
    tc_weight=1.0,
    sc_weight=1.0,
    rw_weight=10.0,     # See GPU Optimization Guide for tuning
    sc_target=30.0      # ~8GB GPU (2^30 × 4 bytes for float32)
)

For GPU optimization: See GPU Optimization Guide for rw_weight tuning methodology.

Calculate sc_target for GPU:

import math

# NVIDIA RTX 3090: 24GB VRAM
gpu_gb = 24
bytes_per_element = 4  # float32 (most common on GPU)
sc_target = math.log2(gpu_gb * 1024**3 / bytes_per_element)
# sc_target ≈ 32.5

# Be conservative (leave room for framework overhead)
sc_target = 32.0  # ~16GB usable

Decision Guide

Choose based on your bottleneck:

  1. Default (CPU, balanced): Use ScoreFunction() with defaults

    • Works for most CPU scenarios
    • Balances time and memory
  2. GPU: See GPU Optimization Guide for complete rw_weight tuning methodology

  3. Memory constrained: Increase sc_weight and lower sc_target

    • Example: ScoreFunction(sc_weight=3.0, sc_target=25.0)
    • Accepts slower execution to fit in memory
  4. Need slicing: See Slicing Strategy Guide

    • Use when contraction exceeds available memory
    • Trades time for space

Advanced Configuration

Memory-Constrained Optimization

When memory is the primary constraint:

# Heavily penalize exceeding target
score = ScoreFunction(
    tc_weight=0.1,      # Time is secondary
    sc_weight=10.0,     # Memory is critical
    rw_weight=0.0,
    sc_target=25.0      # Hard limit: 256MB
)

# Use with TreeSA for best results
optimizer = TreeSA(ntrials=10, niters=50, score=score)
tree = optimize_code(ixs, out, sizes, optimizer)

Speed-Critical Optimization

When execution speed is paramount:

# Minimize time complexity only
score = ScoreFunction(
    tc_weight=1.0,
    sc_weight=0.0,      # Ignore memory
    rw_weight=0.0,
    sc_target=float('inf')  # No memory limit
)

Hybrid CPU+GPU

For heterogeneous systems:

# Moderate I/O penalty (between CPU and GPU)
score = ScoreFunction(
    tc_weight=1.0,
    sc_weight=1.0,
    rw_weight=10.0,     # Moderate I/O cost
    sc_target=29.0      # 4GB limit
)

sc_target Reference

Memoryfloat32float64sc_target
256 MB64M elements32M elements25.0-26.0
1 GB256M128M27.0-28.0
4 GB1B512M29.0-30.0
8 GB2B1B30.0-31.0
16 GB4B2B31.0-32.0
32 GB8B4B32.0-33.0

Tuning Workflow

  1. Start with defaults:

    tree = optimize_code(ixs, out, sizes)
    complexity = tree.complexity(ixs, sizes)
    print(f"tc: {complexity.tc:.2f}, sc: {complexity.sc:.2f}")
    
  2. Identify bottleneck:

    • If sc too high → increase sc_weight or lower sc_target
    • If tc too high → try TreeSA with default score
    • If running on GPU → see GPU Optimization Guide
  3. Adjust and re-optimize:

    score = ScoreFunction(sc_weight=2.0, sc_target=28.0)
    tree = optimize_code(ixs, out, sizes, TreeSA(score=score))
    
  4. Verify improvement:

    new_complexity = tree.complexity(ixs, sizes)
    print(f"New tc: {new_complexity.tc:.2f}, sc: {new_complexity.sc:.2f}")
    

Examples

See examples/score_function_examples.py for complete examples:

  • CPU optimization
  • GPU optimization (experimental rw_weight tuning)
  • Memory-limited environments
  • Dynamic sc_target calculation

Next Steps