In addition, they show a counter-intuitive scaling limit: their reasoning effort and hard work boosts with issue complexity nearly a point, then declines Irrespective of getting an enough token spending budget. By evaluating LRMs with their regular LLM counterparts underneath equal inference compute, we recognize a few overall performance https://www.youtube.com/watch?v=snr3is5MTiU