0

Likes

0

Saves

Back to updates

[r/ML] [D] 60% MatMul Performance Bug in cuBLAS on RTX 5090 [D]

Impact: 8/10

Swipe left/right

Previous Article

Move to the previous article in the active list

Move to the next article in the active list

Summary

A significant performance bug has been discovered in cuBLAS, causing a 60% performance loss for batched FP32 matrix multiplications on RTX GPUs, including the new RTX 5090. The library dispatches inefficient kernels, utilizing only about 40% of available compute, even with the latest CUDA and cuBLAS versions. This issue affects a fundamental operation critical for many AI/ML workloads.

Continue Reading

Explore related coverage about community news and adjacent AI developments: [r/ML] [D] MYTHOS-INVERSION STRUCTURAL AUDIT, [r/LocalLLaMA] karpathy / autoresearch, [HN] Show HN: Ship of Theseus License, [r/ML] [R] Agentic AI and Occupational Displacement: A Multi-Regional Task Exposure Analysis (236 occupations, 5 US metros).

Related Articles

[r/ML] [D] MYTHOS-INVERSION STRUCTURAL AUDIT
March 29, 2026
[r/LocalLLaMA] karpathy / autoresearch
March 10, 2026
[HN] Show HN: Ship of Theseus License
April 10, 2026
[r/ML] [R] Agentic AI and Occupational Displacement: A Multi-Regional Task Exposure Analysis (236 occupations, 5 US metros)
April 7, 2026

Comments

Sign in to leave a comment.

Loading comments...