0

Likes

0

Saves

Back to updates

[r/LocalLLaMA] From FlashLM to State Flow Machine: stopped optimizing transformers, started replacing them. First result: 79% length retention vs transformers' 2%

Impact: 9/10

Swipe left/right

Previous Article

Move to the previous article in the active list

Move to the next article in the active list

Summary

A researcher, previously known for the FlashLM series, has shifted from optimizing transformers to developing entirely new language model architectures. Their latest "State Flow Machine" approach, which avoids attention and convolution, has demonstrated a remarkable 79% length retention compared to transformers' 2%. This development suggests a potentially significant alternative to current transformer-based models, particularly in handling long contexts efficiently.

Continue Reading

Explore related coverage about community news and adjacent AI developments: [r/ML] [D] MYTHOS-INVERSION STRUCTURAL AUDIT, [r/LocalLLaMA] karpathy / autoresearch, [r/ML] [R] Agentic AI and Occupational Displacement: A Multi-Regional Task Exposure Analysis (236 occupations, 5 US metros), [r/ML] Building behavioural response models of public figures using Brain scan data (Predict their next move using psychological modelling) [P].

Related Articles

Comments

Sign in to leave a comment.

Loading comments...