[r/LocalLLaMA] Running DeepSeek V3.2 with dense attention (like in llama.cpp) makes it a bit dumber

Summary

A user investigated how attention implementation (dense vs. sparse) affects DeepSeek V3.2's reasoning performance, particularly when run with dense attention like in llama.cpp. While earlier tests on smaller graphs showed no difference, new benchmarks using much larger lineage graphs revealed that dense attention slightly degrades the model's reasoning capabilities. This indicates that specific implementation choices can subtly impact the performance of local large language models.

Continue Reading

[r/ML] [D] MYTHOS-INVERSION STRUCTURAL AUDIT
March 29, 2026
[r/LocalLLaMA] karpathy / autoresearch
March 10, 2026
[HN] Show HN: Ship of Theseus License
April 10, 2026
[r/ML] [R] Agentic AI and Occupational Displacement: A Multi-Regional Task Exposure Analysis (236 occupations, 5 US metros)
April 7, 2026

Comments

Loading comments...

[r/LocalLLaMA] Running DeepSeek V3.2 with dense attention (like in llama.cpp) makes it a bit dumber

Summary

Continue Reading

Related Articles

Comments