Your AI Runs on Someone Else's Computer

🎧 Listen to this article

Narrated by Talon · The Noble House

There's an old joke in technology circles: "The cloud is just someone else's computer." The joke was about marketing. When you type something into ChatGPT, your reasoning patterns travel to OpenAI's data centers. When you ask Claude to review a contract, the contract's contents leave your machine. For casual use, this is unremarkable. For serious use — the kind where AI becomes your cognitive infrastructure — you've built a dependency on someone else's server in a way that raises questions most people haven't examined.

The sovereign AI movement isn't about paranoia. It's about the same principle that drove companies off public cloud during 2022–2023 when the math on data egress and vendor lock-in stopped working: when the calculus changes, ownership beats renting.

Three Convergences That Made Local AI Viable

First, Apple Silicon unified memory. The M-series chips use unified memory architecture — RAM and GPU share the same physical memory pool. A MacBook Pro with 48GB of unified memory can load and run a quantized 70-billion parameter model that would require a dedicated GPU server to run on conventional hardware. The memory bandwidth on M2 Ultra and M3 Ultra exceeds most GPU configurations available to individual users.

Second, quantization matured. In 2023, running a model at reduced precision meant noticeable quality loss that made the models impractical for real work. By late 2025, GGUF quantization at 4-bit had reached the point where the quality gap between quantized models and their full-precision cloud equivalents is difficult to detect on most practical tasks. The research infrastructure for this — llama.cpp, GGUF format, continuous benchmark tracking — is open-source and actively maintained.

Third, the deployment tools got usable. Ollama reduced downloading and serving a model to a single terminal command. Open WebUI provided a ChatGPT-style browser interface connecting to locally-served models. LM Studio made model management accessible without terminal familiarity. The 2023 experience of running local models required significant technical overhead; the 2026 experience requires a download and a launch command.

Local AI infrastructure on personal hardware versus cloud data centers visualization — Apple Silicon unified memory architecture enables running quantized large language models on personal hardware. The M3 Ultra's 192GB unified memory is comparable to dedicated AI inference hardware that cost $50,000+ in 2022.

What Your AI Conversations Actually Reveal

The precision matters here. When you use a cloud AI for business work, the risk isn't that a human at OpenAI or Anthropic is reading your conversations — corporate privacy policies prohibit this, and the operational logistics of monitoring millions of conversations are implausible. The actual risks are more structural:

Training data usage. Most enterprise tier subscriptions explicitly exclude customer data from training; consumer tier terms have varied and changed over time. The relevant question is which tier you're on and whether you've read the current terms.

Legal process. Cloud-stored data is subject to subpoena, national security requests, and international data transfer regulations in ways that locally-processed data is not. This matters for attorneys under privilege obligations, healthcare providers under HIPAA, financial professionals with client confidentiality requirements.

Behavioral inference. The patterns in how you use an AI — what problems you bring to it, what reasoning you expose, what decisions you're working through — are potentially more revealing than any single conversation. This is not a theoretical concern for competitive intelligence contexts.

Vendor dependency. When your reasoning process runs on a vendor's infrastructure, your capability is constrained by their uptime, their pricing decisions, their terms changes, and their continued existence. OpenAI raised prices in 2024. API access has been throttled during high-demand periods. The infrastructure dependency is real.

The Specific Use Cases Worth Running Locally

A blanket claim that "AI should run locally" is wrong — it ignores real capability differences. Frontier models running on OpenAI and Anthropic's infrastructure have capabilities that current local models can't match for complex reasoning, long-context analysis, and multimodal tasks. For many use cases, cloud is the right answer.

The correct framing is use-case specific: what categories of work have privacy requirements that outweigh the capability premium from cloud models?

Legal review of privileged documents. HR records processing. Medical records analysis in non-HIPAA-compliant cloud contexts. Financial analysis containing proprietary strategy. Personal journaling and psychological reflection. Competitive intelligence research. These are categories where the information sensitivity justifies accepting a capability discount.

Sovereign AI data sovereignty personal hardware security versus cloud convenience tradeoffs — The sovereign AI decision isn't binary — it's use-case specific. Frontier cloud models have genuine capability advantages. Local models have genuine privacy and autonomy advantages. The skill is knowing which matters more for which task.

The Practical Stack

For someone starting in 2026: Ollama for model serving, Open WebUI for the interface, a 70B parameter Qwen or Llama model in Q4_K_M quantization for general-purpose work. On an M-series Mac with 48GB+ unified memory, this runs at usable inference speeds for most tasks. The quality is roughly comparable to what GPT-4 delivered in 2023 — behind current frontier models, capable of sophisticated analysis, writing, and reasoning.

The infrastructure is stable, free, and runs on hardware you already own or can purchase for consumer laptop prices. The question of whether to use it isn't primarily technical. It's about understanding what you're optimizing for: capability ceiling or information autonomy. In 2026, both are legitimate and available.

Sources: Apple Silicon unified memory architecture (Apple technical documentation); llama.cpp project documentation; Ollama documentation (ollama.ai); Open WebUI project documentation; OpenAI Terms of Service and Enterprise Privacy commitments (2025–2026); 37signals "Why we're leaving the cloud" (DHH, October 2022 — canonical example of the cloud exit calculus)

Three Convergences That Made Local AI Viable

What Your AI Conversations Actually Reveal

The Specific Use Cases Worth Running Locally

The Practical Stack

Sources

Stay Informed