<p><span class="h-card" translate="no"><a href="https://tech.lgbt/@snowfox" class="u-url mention">@<span>snowfox</span></a></span> <span class="h-card" translate="no"><a href="https://mastodon.social/@whitequark" class="u-url mention">@<span>whitequark</span></a></span> Since PSHUFB returns 0 when the MSB is set, you can chain them with XORs to get larger LUTs (in 16B increments) if you're OK with the LUT itself being a bit funny. Specifically:</p><p> vpshufb out, lut0, index</p><p> vpsubb index, index, sixteen<br /> vpshufb temp, lut1, index<br /> vpxor out, out, temp</p><p> vpsubb index, index, sixteen<br /> vpshufb temp, lut2, index<br /> vpxor out, out, temp</p><p>init:<br /> lut0 = tab[0:16]<br /> lut1 = tab[16:32] ^ lut0<br /> lut2 = tab[32:48] ^ lut1</p><p>and so forth</p>