[r/LocalLLaMA] Running DeepSeek V3.2 with dense attention (like in llama.cpp) makes it a bit dumber

Summary

A user investigated how attention implementation (dense vs. sparse) affects DeepSeek V3.2's reasoning performance, particularly when run with dense attention like in llama.cpp. While earlier tests on smaller graphs showed no difference, new benchmarks using much larger lineage graphs revealed that dense attention slightly degrades the model's reasoning capabilities. This indicates that specific implementation choices can subtly impact the performance of local large language models.

Continue Reading

Explore related coverage about community news and adjacent AI developments: [r/ML] [D] MYTHOS-INVERSION STRUCTURAL AUDIT, [r/LocalLLaMA] karpathy / autoresearch, [r/ML] [R] Agentic AI and Occupational Displacement: A Multi-Regional Task Exposure Analysis (236 occupations, 5 US metros), [r/ML] Building behavioural response models of public figures using Brain scan data (Predict their next move using psychological modelling) [P].

[r/LocalLLaMA] Running DeepSeek V3.2 with dense attention (like in llama.cpp) makes it a bit dumber

Summary

Continue Reading

Related Articles

Comments