Unitary matrix operations without allocation

A unitary matrix features uniform eigenvalues and reversibility. It is widely used as an approach to ease the gradient exploding and vanishing problem and the memory wall problem. One of the simplest ways to parametrize a unitary matrix is representing a unitary matrix as a product of two-level unitary operations. A real unitary matrix of size $N$ can be parametrized compactly by $N(N-1)/2$ rotation operations

\[ {\rm ROT}(a!, b!, \theta) = \left(\begin{matrix} \cos(\theta) & - \sin(\theta)\\ \sin(\theta) & \cos(\theta) \end{matrix}\right) \left(\begin{matrix} a!\\ b! \end{matrix}\right),\]

where $\theta$ is the rotation angle, a! and b! are target registers.

using NiLang, NiLang.AD

@i function umm!(x!, θ)
    @safe @assert length(θ) ==
            length(x!)*(length(x!)-1)/2
    k ← 0
    for j=1:length(x!)
        for i=length(x!)-1:-1:j
            k += 1
            ROT(x![i], x![i+1], θ[k])
        end
    end

    k → length(θ)
end

Here, the ancilla k is deallocated manually by specifying its value, because we know the loop size is $N(N-1)/2$. We define the test functions in order to check gradients.

@i function isum(out!, x::AbstractArray)
    for i=1:length(x)
        out! += x[i]
    end
end

@i function test!(out!, x!::Vector, θ::Vector)
   umm!(x!, θ)
   isum(out!, x!)
end

Let's print the program output

out, x, θ = 0.0, randn(4), randn(6);
@instr Grad(test!)(Val(1), out, x, θ)
x
4-element Vector{NiLang.AD.GVar{Float64, Float64}}:
 GVar(-0.493191407680952, 0.664260852910633)
 GVar(-0.17812932998323106, 0.9637750125313596)
  GVar(0.15100498492863101, 1.5697273928237314)
  GVar(2.6142046106704937, -0.4072482740656205)

We can erease the gradient field by uncomputing the gradient function. If you want, you can differentiate it twice to obtain Hessians. However, we suggest using ForwardDifferentiation over our NiLang program, this is more efficient.

@instr (~Grad(test!))(Val(1), out, x, θ)
x
4-element Vector{Float64}:
 -0.4931914076809521
 -0.1781293299832309
  0.15100498492863113
  2.6142046106704933

In the above testing code, Grad(test) attaches a gradient field to each element of x. ~Grad(test) is the inverse program that erase the gradient fields. Notably, this reversible implementation costs zero memory allocation, although it changes the target variables inplace.


This page was generated using Literate.jl.