How to port NiLang to Zygote

In this demo we'll show how to insert NiLang's gradient implementation to boost Zygote's gradient. A similar demo for ChainRules can be found in How to port NiLang to ChainRules.

using NiLang, NiLang.AD, Zygote

Let's start from the Julia native implementation of norm2 function.

function norm2(x::AbstractArray{T}) where T
    out = zero(T)
    for i=1:length(x)
        @inbounds out += x[i]^2
    end
    return out
end
norm2 (generic function with 1 method)

Zygote is able to generate correct dual function, i.e., gradients, but much slower than the primal function norm2

using BenchmarkTools
x = randn(1000);
original_grad = norm2'(x)
@benchmark norm2'($x) seconds=1
BenchmarkTools.Trial: 296 samples with 1 evaluation.
 Range (min … max):  2.271 ms … 11.741 ms  ┊ GC (min … max):  0.00% … 54.74%
 Time  (median):     2.482 ms              ┊ GC (median):     0.00%
 Time  (mean ± σ):   3.369 ms ±  2.243 ms  ┊ GC (mean ± σ):  22.78% ± 22.54%

  ▄█▅                                                         
  ███▅▄▁▁▁▁▁▁▁▁▁▁▁▄▅▅▁▁▁▁▁▁▁▆▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▆▇██▄▅▆▅▅ ▅
  2.27 ms      Histogram: log(frequency) by time     9.76 ms <

 Memory estimate: 8.36 MiB, allocs estimate: 19059.

The primal function is

@benchmark norm2($x) seconds=1
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.300 μs …  3.780 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.310 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.317 μs ± 60.758 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                     █                                        
  ▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▂
  1.3 μs         Histogram: frequency by time        1.33 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Then we have the reversible implementation

@i function r_norm2(out::T, x::AbstractArray{T}) where T
    for i=1:length(x)
        @inbounds out += x[i]^2
    end
end

The gradient generated by NiLang is much faster, which is comparable to the forward program

@benchmark (~r_norm2)(GVar($(norm2(x)), 1.0), $(GVar(x))) seconds=1
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  40.800 μs …  65.501 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     41.401 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   41.501 μs ± 913.488 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

      ▂▆█▃                                                      
  ▂▃▄▆████▆▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂ ▃
  40.8 μs         Histogram: frequency by time         46.8 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

to enjoy the speed of NiLang in Zygote, just bind the adjoint rule

Zygote.@adjoint function norm2(x::AbstractArray{T}) where T
    out = norm2(x)
    out, δy -> (grad((~r_norm2)(GVar(out, δy), GVar(x))[2]),)
end
@assert norm2'(x) ≈ original_grad

See, much faster

@benchmark norm2'(x) seconds=1
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  44.600 μs …  5.425 ms  ┊ GC (min … max): 0.00% … 97.57%
 Time  (median):     47.300 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   49.309 μs ± 75.434 μs  ┊ GC (mean ± σ):  2.12% ±  1.38%

       ▂ █▇▄█▂▅ ▁                                              
  ▁▂▂▄▅████████▇█▅▅▄▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁ ▂
  44.6 μs         Histogram: frequency by time        59.7 μs <

 Memory estimate: 23.69 KiB, allocs estimate: 2.

This page was generated using Literate.jl.