<p>It&#39;s the whole thing where if you ask an LLM to multiply two small numbers together, someone has probably done that somewhere, so it &quot;works,&quot; but it completely fails for larger numbers. &quot;Reasoning&quot; models can get around that by giving an escape hatch to eval, like the original chain-of-thought paper, but then why not just use eval directly?</p><p>But regardless, if you think of a task common enough that it has been solved in the training corpus, then it &quot;works,&quot; right?</p>
Reply