Gated Linear Unit

Note

  • Gated Linear Units [Dauphin et al., 2016] consist of the component-wise product of two linear projections, one of which is first passed through a sigmoid function. Variations on GLU are possible, using different nonlinear (or even linear) functions in place of sigmoid.
  • Gated means that there is a component responsible for letting signal through. Most often that is achieved by multiplying a signal by a number from 0 to 1.

SwiGLU

Why does it work?

We offer no explanation as to why these architectures seem to work; we attribute their success, as all else, to divine benevolence.

Resources


table file.inlinks, file.outlinks from [[]] and !outgoing([[]])  AND -"Changelog"