Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

README.md

MoE (Mixture of Experts)

🔥 YouTube Video:  Research Paper Deep Dive - The Sparsely-Gated Mixture-of-Experts (MoE)

Research Paper Deep Dive - The Sparsely-Gated Mixture-of-Experts (MoE)

The Problem:

We already know that neural networks can achieve impressive results on a wide range of tasks, such as image classification, machine translation, and protein folding prediction, with the use of inductive biases such as convolutions or sequence attention), increasingly large datasets, and more specialized hardware. The magic behind these amazing results are super massive models with massive collection of paramterrs.

We all know that large model sizes is necessary for strong generalization and robustness, so training large models while limiting resource requirements is becoming increasingly important.

There is a hidden problem underneath these superstar, eyepoping results and the problem is significant use of computation resources or the requirements, which includes supermassive hardware and that includes logisting, cost, power requirement, and top of the above feasibity to even move it outside labs..

The solution:

One promising approach is to use conditional computation:

  • Rather than activating the whole network for every single input, different parts of the model are activated for different inputs.
  • Most import things to note here is that MoEs is used a general purpose neural network component.

What everybody wants:

  • Scale up the model....
  • Adding model capacity (scaling up) without adding computations resourcs

Research Papers & GitHub Source Code(s)

Resources