LongCoT is a new, scalable benchmark introduced to measure the long-horizon Chain-of-Thought (CoT) reasoning capabilities of frontier language models. Comprising 2,500 expert-designed problems across chemistry, mathematics, computer science, chess, and logic, it addresses the critical need for models to reason accurately over longer horizons for complex autonomous tasks. This benchmark aims to isolate and directly assess this crucial aspect of AI performance.