<p><span class="h-card" translate="no"><a href="https://mastodon.social/@ednl" class="u-url mention">@<span>ednl</span></a></span> I'd really love to know how that compares with what I believe are smaller models like some of the LLama models (eg, I have 3.2 running on my machine). They answer small queries very quickly on a laptop, so it's hard to believe they have the same consumption. (Also, it sometimes requires work to make them give *brief* answers and not needlessly elaborate, which I believe would also make a difference?)<br />CC <span class="h-card" translate="no"><a href="https://scholar.social/@wim_v12e" class="u-url mention">@<span>wim_v12e</span></a></span></p>