Summary
A significant performance bug has been discovered in cuBLAS, causing a 60% performance loss for batched FP32 matrix multiplications on RTX GPUs, including the new RTX 5090. The library dispatches inefficient kernels, utilizing only about 40% of available compute, even with the latest CUDA and cuBLAS versions. This issue affects a fundamental operation critical for many AI/ML workloads.
Editorial note
AI Dose summarizes public reporting and links to original sources when they are available. Review the Editorial Policy, Disclaimer, or Contact page if you need to flag a correction or understand how this site handles sources.
Continue Reading
Explore related coverage about community news and adjacent AI developments: [r/ML] [D] MYTHOS-INVERSION STRUCTURAL AUDIT, [r/LocalLLaMA] karpathy / autoresearch, [HN] Show HN: FlipAEO – Get your SaaS cited by Perplexity and AI search, [r/ML] You can decompose models into a graph database [N].
Related Articles
- [r/ML] [D] MYTHOS-INVERSION STRUCTURAL AUDIT
March 29, 2026
- [r/LocalLLaMA] karpathy / autoresearch
March 10, 2026
- [HN] Show HN: FlipAEO – Get your SaaS cited by Perplexity and AI search
April 15, 2026
- [r/ML] You can decompose models into a graph database [N]
April 15, 2026
Next read
[r/ML] [D] MYTHOS-INVERSION STRUCTURAL AUDIT
Stay with the thread by reading one adjacent story before leaving this update.
Comments
Sign in to leave a comment.