AI Dose
0
Likes
0
Saves
Back to updates

[r/LocalLLaMA] Residual connections haven't changed for 10 years and Kimi just replaced them with attention

Impact: 8/10
Swipe left/right

Summary

Kimi has introduced "Attention Residuals" to replace traditional residual connections, which have remained unchanged for a decade. This new method uses a softmax attention mechanism where each layer selectively retrieves information from previous layers via a learned query vector, rather than simply summing them with equal weight. Early scaling law experiments show Block AttnRes achieving comparable loss to baseline models.

Continue Reading

Explore related coverage about community news and adjacent AI developments: [r/ML] [D] MYTHOS-INVERSION STRUCTURAL AUDIT, [r/LocalLLaMA] karpathy / autoresearch, [r/ML] [R] Agentic AI and Occupational Displacement: A Multi-Regional Task Exposure Analysis (236 occupations, 5 US metros), [r/ML] Building behavioural response models of public figures using Brain scan data (Predict their next move using psychological modelling) [P].

Related Articles

Comments

Sign in to leave a comment.

Loading comments...