EMO: AllenAI pretrains mixture of experts so modularity emerges from data
AllenAI releases EMO, a 1B-active / 14B-total MoE where experts self-organize into domain-level modules — using just 12.5% of experts retains near full-model performance.
Tag
Stories with this tag. Sections and all tags live in the Topics menu; for full-text use search.
AllenAI releases EMO, a 1B-active / 14B-total MoE where experts self-organize into domain-level modules — using just 12.5% of experts retains near full-model performance.