Slicing Strategy
Reduce memory usage by slicing tensor indices, trading time for space.
What is Slicing?
Slicing breaks a tensor contraction into smaller chunks by looping over one or more indices.
Example:
# Original: C[i,k] = Σⱼ A[i,j] * B[j,k]
# Too large: j has size 10000
# Sliced: Loop over j in chunks
C = zeros(i, k)
for j_val in range(10000):
C += A[:, j_val] * B[j_val, :]
Trade-off:
- ✅ Memory reduced: Only store one slice at a time
- ❌ Time increased: Repeat computation for each slice
When to Use Slicing
Use slicing when:
- Contraction exceeds available memory
sc(space complexity) is too high- Can’t reduce memory by changing contraction order alone
Decision flowchart:
Optimize contraction order
↓
Check space complexity
↓
sc > sc_target?
├─ No → Done
└─ Yes → Apply slicing
Basic Usage
Automatic Slicing
from omeco import optimize_code, slice_code, TreeSASlicer, ScoreFunction
# 1. Optimize contraction order
tree = optimize_code(ixs, out, sizes)
# 2. Check if memory is too large
complexity = tree.complexity(ixs, sizes)
if complexity.sc > 28.0: # More than 256MB
# 3. Slice to reduce memory
score = ScoreFunction(sc_target=28.0)
slicer = TreeSASlicer.fast(score=score)
sliced = slice_code(tree, ixs, sizes, slicer)
# 4. Verify reduction
sliced_comp = sliced.complexity(ixs, sizes)
print(f"Original: 2^{complexity.sc:.2f} elements")
print(f"Sliced: 2^{sliced_comp.sc:.2f} elements")
print(f"Sliced indices: {sliced.slicing()}")
Manual Slicing
Specify which indices to slice:
# Force slicing on specific indices
slicer = TreeSASlicer.fast(fixed_slices=[1, 2])
sliced = slice_code(tree, ixs, sizes, slicer)
# Verify that indices 1 and 2 are sliced
assert 1 in sliced.slicing()
assert 2 in sliced.slicing()
TreeSASlicer Configuration
Fast Preset
# Quick slicing optimization
slicer = TreeSASlicer.fast(score=ScoreFunction(sc_target=25.0))
sliced = slice_code(tree, ixs, sizes, slicer)
Custom Configuration
from omeco import TreeSASlicer, ScoreFunction
score = ScoreFunction(
tc_weight=0.1, # Accept higher time
sc_weight=10.0, # Prioritize space
sc_target=25.0 # Target 256MB
)
slicer = TreeSASlicer(
ntrials=10, # Number of optimization runs
niters=50, # Iterations per run
fixed_slices=[1], # Force slicing on index 1
score=score
)
sliced = slice_code(tree, ixs, sizes, slicer)
Understanding the Trade-off
Example: Large Matrix Multiplication
# A[10000 × 10000] × B[10000 × 10000]
ixs = [[0, 1], [1, 2]]
out = [0, 2]
sizes = {0: 10000, 1: 10000, 2: 10000}
# Without slicing
tree = optimize_code(ixs, out, sizes)
comp = tree.complexity(ixs, sizes)
# tc ≈ 33.2 (10^10 FLOPs)
# sc ≈ 26.6 (100M elements = 800MB for float64)
# With slicing on index 1 (middle dimension)
slicer = TreeSASlicer.fast(fixed_slices=[1])
sliced = slice_code(tree, ixs, sizes, slicer)
sliced_comp = sliced.complexity(ixs, sizes)
# tc ≈ 33.2 (same FLOPs, distributed over slices)
# sc ≈ 23.3 (10M elements = 80MB for float64)
# Result: 10x memory reduction, same total time (but distributed)
Choosing Which Indices to Slice
Automatic Selection
TreeSASlicer automatically chooses which indices to slice:
# Let the optimizer decide
slicer = TreeSASlicer.fast(score=ScoreFunction(sc_target=25.0))
sliced = slice_code(tree, ixs, sizes, slicer)
print(f"Automatically sliced indices: {sliced.slicing()}")
Algorithm: Tries slicing different indices and picks the combination that best satisfies sc_target while minimizing tc increase.
Manual Selection Guidelines
Good candidates for slicing:
- Indices with large dimensions
- Indices that appear in many tensors (high degree)
- Indices not in the output (no outer loop needed)
Poor candidates:
- Output indices (can’t slice without changing result structure)
- Small dimensions (little memory benefit)
- Indices that would cause many small intermediate tensors
Advanced: Slicing with GPU
For GPU workloads, consider both memory and I/O:
# GPU slicing: balance memory and I/O
score = ScoreFunction(
tc_weight=1.0,
sc_weight=2.0, # Space is critical on GPU
rw_weight=0.1, # Experimental: tune empirically
sc_target=30.0 # 8GB GPU limit
)
slicer = TreeSASlicer(
ntrials=10,
niters=50,
score=score
)
sliced = slice_code(tree, ixs, sizes, slicer)
Note: Slicing may increase rwc (read-write complexity) because data is loaded multiple times. The optimizer balances this with the rw_weight parameter.
Example Workflow
from omeco import optimize_code, slice_code
from omeco import TreeSA, TreeSASlicer, ScoreFunction
# Problem setup
ixs = [[0,1,2], [2,3,4], [4,5,6], [6,7,0]] # Cycle graph
out = [1,3,5,7]
sizes = {i: 32 for i in range(8)}
# Step 1: Optimize contraction order
tree = optimize_code(ixs, out, sizes, TreeSA.fast())
comp = tree.complexity(ixs, sizes)
print(f"Optimized: sc = 2^{comp.sc:.2f}")
# Step 2: Check if slicing needed
if comp.sc > 20.0: # Exceeds 1MB
print("Applying slicing...")
# Step 3: Slice
slicer = TreeSASlicer.fast(score=ScoreFunction(sc_target=18.0))
sliced = slice_code(tree, ixs, sizes, slicer)
# Step 4: Verify
sliced_comp = sliced.complexity(ixs, sizes)
print(f"Sliced: sc = 2^{sliced_comp.sc:.2f}")
print(f"Sliced indices: {sliced.slicing()}")
# Calculate overhead
time_overhead = 2 ** (sliced_comp.tc - comp.tc)
memory_reduction = 2 ** (comp.sc - sliced_comp.sc)
print(f"Time overhead: {time_overhead:.1f}x")
print(f"Memory reduction: {memory_reduction:.1f}x")
Multiple Slices
Slice multiple indices for extreme memory reduction:
# Very tight memory constraint
score = ScoreFunction(
tc_weight=0.01, # Accept much higher time
sc_weight=100.0, # Memory is critical
sc_target=15.0 # Only 32KB available!
)
slicer = TreeSASlicer(ntrials=20, niters=100, score=score)
sliced = slice_code(tree, ixs, sizes, slicer)
# May slice multiple indices
print(f"Sliced {len(sliced.slicing())} indices: {sliced.slicing()}")
Performance Impact
| Scenario | Memory Reduction | Time Overhead | When to Use |
|---|---|---|---|
| Slice 1 index | 2-10x | 1-2x | Moderate memory constraint |
| Slice 2 indices | 10-100x | 2-5x | Severe memory constraint |
| Slice 3+ indices | 100-1000x | 5-20x | Extreme memory constraint |
Tips
-
Optimize order first, slice second:
# Good workflow tree = optimize_code(ixs, out, sizes, TreeSA.fast()) sliced = slice_code(tree, ixs, sizes, TreeSASlicer.fast()) -
Set realistic sc_target:
# Target 80% of available memory available_gb = 8 sc_target = math.log2(available_gb * 0.8 * 1024**3 / 8) -
Profile actual memory usage:
# Theoretical vs actual memory may differ # Always test with real data -
Consider time-memory trade-off:
# For one-time computation: accept high memory # For repeated computation: slice to fit memory
Next Steps
- Score Function Configuration - Tune slicing parameters
- GPU Optimization - GPU-specific slicing
- Complexity Metrics - Understand sc and tc