<p><span class="h-card" translate="no"><a href="https://tech.lgbt/@snowfox" class="u-url mention">@<span>snowfox</span></a></span> <span class="h-card" translate="no"><a href="https://mastodon.social/@whitequark" class="u-url mention">@<span>whitequark</span></a></span> That gets you up to 128 entries with linearly increasing cost.</p><p>(The idea is that the lower table inds roll into negatives, so the later VPSHUFB give all 0.)</p><p>You can do the same with a tree of VPBLENDVB, but this needs fewer temp regs and is somewhat easier to set up into.</p><p>You can also do up to 256 by using VPSUBSB (so that you saturate at -128), doing one 128-entry LUT for [0:128] and one for [128:256] with index^128 as seed, and a final VPBLENDVB to combine.</p>
Reply