[r/ML] [R] A Gradient Descent Misalignment — Causes Normalisation To Emerge

Summary

A new paper accepted at ICLR's GRaM workshop suggests that while gradient descent takes the steepest descent step for parameters, it may systematically take a 'wrong step' in activation space. This misalignment is mathematically demonstrated for common neural network components like affine layers, convolutions, and attention. The research explores solutions and proposes this phenomenon could offer an alternative mechanistic explanation for why normalization techniques are beneficial in deep learning.

Continue Reading

Explore related coverage about community news and adjacent AI developments: [r/ML] [D] MYTHOS-INVERSION STRUCTURAL AUDIT, [r/LocalLLaMA] karpathy / autoresearch, [r/ML] [R] Agentic AI and Occupational Displacement: A Multi-Regional Task Exposure Analysis (236 occupations, 5 US metros), [r/ML] Building behavioural response models of public figures using Brain scan data (Predict their next move using psychological modelling) [P].

[r/ML] [R] A Gradient Descent Misalignment — Causes Normalisation To Emerge

Summary

Continue Reading

Related Articles

Comments