Local Inference Hardware
To run truly private AI on your own hardware, you need a suitable CPU or GPU. Small LLMs - or heavily quantised larger models - can run well on recent CPUs. But larger, or less quantised models need serious GPU power and the two games in town are Nvidia and Apple.
Today, the most powerful Mac for LLM inference is the Apple Studio M2 Ultra with 192GB RAM. Released in June '23, this is 2 x M2 chips with a very high bandwidth/fast interconnect. Apple watchers are suggesting the M3 Ultra could be released June '24.
Given the pace of open-source LLM development and associated tooling, this may be worth waiting for if you are already in the Apple ecosystem and need strictly private inference at pace.
Seriously expensive, but if you have confidential workflows that could materially benefit from a fast, private assistant, this could pay for itself in relatively short order.
Related Posts
-
7 Critical Factors in the AI-AppSec Risk Equation
Key factors I consider before integrating Large Language Models (LLMs) into the SDLC
-
Meta LLaMA leaked: Private AI for the masses
AI Governance Dilemma: Leaked Llama Model Outperforms GPT-3! Explore the debate on trust, policy, and control as cutting-edge AI slips into public domain.
-
LLM Deployment Matrix v1
Key deployment factors across five deployment models: