Local Inference Hardware

Truly private AI. Can it pay for itself?

To run truly private AI on your own hardware, you need a suitable CPU or GPU. Small LLMs - or heavily quantised larger models - can run well on recent CPUs. But larger, or less quantised models need serious GPU power and the two games in town are Nvidia and Apple.

Today, the most powerful Mac for LLM inference is the Apple Studio M2 Ultra with 192GB RAM. Released in June '23, this is 2 x M2 chips with a very high bandwidth/fast interconnect. Apple watchers are suggesting the M3 Ultra could be released June '24.

Given the pace of open-source LLM development and associated tooling, this may be worth waiting for if you are already in the Apple ecosystem and need strictly private inference at pace.

Seriously expensive, but if you have confidential workflows that could materially benefit from a fast, private assistant, this could pay for itself in relatively short order.