← Back to Blog
Deep Dives·Jan 2, 2025·7 min read

Local LLMs vs Cloud APIs: When Privacy Actually Matters

Running AI locally isn't just for paranoid nerds. Here's when it makes sense, what it costs, and how to actually do it.

AIPrivacyLLMsOllama
JV

Jose Viscasillas

January 2, 2025 · 7 min read

Local LLMs vs Cloud APIs: When Privacy Actually Matters

Every time I paste code into ChatGPT, a small voice in my head asks: "Is this going to end up in someone's training data?"

The answer is probably no (if you've opted out). But "probably" isn't good enough when you're working with client code, proprietary algorithms, or that embarrassing regex you wrote at 2 AM.

So I started running LLMs locally. Here's the honest breakdown of when it makes sense, when it doesn't, and what it actually costs.

The Privacy Argument (And Why Most People Get It Wrong)

The discourse around AI privacy is polarized. One camp says "OpenAI sees everything you type." The other says "Who cares, my code isn't special."

Neither is completely right.

Here's what actually happens with cloud AI services:

OpenAI/Anthropic (as of early 2025):

  • API calls: Not used for training by default
  • ChatGPT/Claude chat: Used for training unless you opt out
  • Enterprise tiers: Contractual guarantees against training

The real risks:

  1. Data breaches (they happen to everyone)
  2. Subpoenas and legal requests
  3. Employee access to conversation logs
  4. Policy changes you won't notice

For most personal projects? Probably fine. For client work with NDAs? For healthcare/finance/legal? For your company's secret sauce? That "probably" starts to matter.

What You Can Actually Run Locally

Let me save you some time. Here's what works in 2025:

The Sweet Spot: 7B-13B Parameter Models

These run on consumer hardware and are actually useful:

  • Llama 3.1 8B - Meta's latest, great all-rounder
  • Mistral 7B - Excellent code understanding
  • CodeLlama 13B - Specifically tuned for programming
  • Phi-3 Mini - Microsoft's efficient small model
bash
# With Ollama (easiest option)
ollama pull llama3.1
ollama run llama3.1 "Explain this regex: ^(?=.*[A-Z])(?=.*[0-9]).{8,}$"

The Reality Check: What You Need

Here's the hardware truth:

Model SizeVRAM NeededActual Performance
7B8GBUsable, some waiting
13B16GBGood, noticeable latency
34B24GB+Great, needs beefy GPU
70B48GB+You need multiple GPUs

Running on CPU is possible but painful. A 7B model on M1 Mac is tolerable. On Intel without GPU? Make coffee between responses.

The Tools That Actually Work

Ollama (My Daily Driver)

Dead simple. Install, pull model, run. Works on Mac, Linux, Windows.

bash
# Install (Mac)
brew install ollama

# Start server
ollama serve

# In another terminal
ollama run codellama "Write a TypeScript function to debounce any function"

The API is OpenAI-compatible, so you can swap it into existing tools:

typescript
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:11434/v1',
  apiKey: 'ollama' // Required but ignored
});

const response = await client.chat.completions.create({
  model: 'llama3.1',
  messages: [{ role: 'user', content: 'Hello!' }]
});

llama.cpp (For The Tinkerers)

More control, more complexity. You're compiling from source, quantizing models, tuning parameters.

Worth it if you need:

  • Maximum performance
  • Custom quantization
  • Specific hardware optimization

Not worth it if you just want answers.

LM Studio (GUI Option)

If terminals scare you (no judgment), LM Studio provides a ChatGPT-like interface for local models. Download from their site, point at a model, chat away.

The Cost Analysis Nobody Does

Let's do actual math instead of vibes.

Cloud API Costs

Based on my usage patterns (roughly 50,000 tokens/day):

ServiceModelDaily CostMonthly
OpenAIGPT-4 Turbo$1.50$45
AnthropicClaude 3 Sonnet$0.90$27
OpenAIGPT-3.5 Turbo$0.10$3

Local Running Costs

Hardware depreciation (assuming 3-year lifespan):

  • RTX 4080 ($1,200): ~$1.10/day
  • Mac M2 Pro ($2,000): ~$1.80/day
  • Electricity: ~$0.20/day for GPU under load

Daily cost: ~$1.30-2.00

The Surprising Conclusion

Local is more expensive for most usage patterns. But:

  1. You already own the hardware (sunk cost)
  2. No per-token anxiety
  3. Works offline
  4. Unlimited usage at fixed cost

If you're hitting 100k+ tokens daily, local wins. If you use AI occasionally, cloud is cheaper.

When Local Actually Makes Sense

1. You're Under NDA

Client code, proprietary algorithms, business logic. Local means no third party ever sees it.

bash
# Review confidential code locally
cat sensitive_algorithm.py | ollama run codellama "Review this for bugs"

2. You Need Offline Access

Flights, spotty internet, coffee shops with terrible WiFi. Local models don't care about your connection.

3. You're Building AI Features

If you're shipping AI features, paying per-token gets expensive fast. Local models for development, cloud for production (or local for production if you've got the infrastructure).

4. You're Learning/Experimenting

No rate limits, no costs, no judgment. Run the same prompt 500 times fine-tuning your approach.

5. Compliance Requirements

HIPAA, GDPR, SOC 2—sometimes the data simply cannot leave your infrastructure. Local is the only option.

When Cloud Is Just Better

Be honest with yourself:

  • Quality matters more than privacy: GPT-4 and Claude 3 Opus are still better than local models for complex reasoning
  • You need it occasionally: $20/month for ChatGPT Plus is fine
  • You don't want to maintain infrastructure: Cloud just works
  • You need latest capabilities: Vision, function calling, etc. come to cloud first

My Actual Setup

Here's what I actually use day-to-day:

For sensitive work:

bash
# Ollama running llama3.1 8B
alias ask='function _ask(){ echo "$1" | ollama run llama3.1; }; _ask'

# Usage
ask "Refactor this authentication logic: [paste code]"

For quality-critical work:

  • Claude API (via Cursor/Continue)
  • Opted out of training, API-only

For casual exploration:

  • ChatGPT Plus
  • Don't paste anything confidential

The Hybrid Approach (What Actually Works)

Here's my recommendation:

  1. Install Ollama with llama3.1 and codellama
  2. Use local for anything with confidential code
  3. Use cloud for general questions, learning, complex reasoning
  4. Route automatically based on content type

You can even automate this:

typescript
async function smartComplete(prompt: string, isConfidential: boolean) {
  if (isConfidential) {
    return await ollamaComplete(prompt, 'codellama');
  }
  return await openaiComplete(prompt, 'gpt-4-turbo');
}

Performance Reality Check

Local models are good. They're not GPT-4 good. Here's my honest assessment:

TaskLocal (7B-13B)Cloud (GPT-4)
Code completion85%95%
Explaining code80%95%
Complex refactoring60%90%
Architecture advice40%80%
Bug identification70%85%

For "write me a function," local is fine. For "design a system," you want the big models.

The Bottom Line

Running LLMs locally isn't paranoid—it's practical for specific use cases. But it's not a replacement for cloud AI. It's a complement.

Run local when:

  • Working with confidential code
  • You need offline access
  • Building AI-powered features
  • Experimenting without limits

Use cloud when:

  • Quality is priority
  • You need latest features
  • Infrastructure isn't your thing
  • Usage is low-to-moderate

The "privacy vs capability" tradeoff is real, but it's not binary. Use both. Be thoughtful about what goes where.

And maybe stop pasting your company's authentication code into ChatGPT at 2 AM. Get Ollama running and sleep better.

---

The best setup is the one that matches your actual needs. For most developers, that's a hybrid—local for sensitive work, cloud for everything else.

JV

Written by Jose Viscasillas

Senior Software Engineer building video platforms at ON24. 21 years of coding experience. I write about React, TypeScript, AI, and developer tools.

Recommended Reads

📬

Subscribe to the Newsletter

New articles delivered to your inbox. No spam, unsubscribe anytime.

Join 500+ developers getting weekly insights on React, TypeScript, and building products.