Foundry Lab · Microsoft AI on Your Hardware

Local AI,
Zero Cloud,
Total Control

Run state-of-the-art AI models on your own hardware. No API keys, no billing meters, no data leaving your machine. One command. Permanent savings.

Foundry Local at a glance
$0per token / month
<50msfirst-token latency
100%data stays local
1 cmdto run any model
# Install & run — that's it
$ foundry model run phi-4-mini
✓ Server ready on localhost:5272

The Full Story

A Complete Story in 7 Chapters

From AI's origins to running your own models — with interactive visualizations at every step. Each section builds on the last, culminating in a live cost comparison.

Why Foundry Local

Cloud AI vs Local AI — Three Decisive Differences

Foundry Local doesn't wrap cloud AI — it replaces it at the infrastructure level. Same application code, same API format, different endpoint: your own machine.

🔒
Data Never Leaves Your Hardware

Every token processed on your device. Nothing sent to external servers. Full compliance for GDPR, HIPAA, SOC 2, and regulated industries — architecturally guaranteed, not policy-controlled.

Sub-50ms First-Token Latency

No network round trip. No shared API queue. No Tuesday-afternoon slowdowns. First token in milliseconds — real-time AI that actually feels real-time to your users.

💰
Zero Token Cost, Forever

Run 1 million tokens or 1 billion — the cost is identical. Hardware is a one-time investment. No surprise bills, no rate limits, no pricing changes on the provider's schedule.

Head-to-Head

Cloud API vs Foundry Local

An honest comparison across the dimensions that matter most to production AI deployments.

Feature
☁ Cloud API
💻 Foundry Local
Data privacy
Leaves your network per request
100% on-device, always
First-token latency
300–800ms (network + queue)
<50ms (no network hop)
Monthly cost (1,000 users)
$4,500–$45,000 / month
$0 / month (HW already paid)
Compliance (GDPR / HIPAA)
Requires DPA, legal review
Architecturally guaranteed
Rate limits
TPM / RPM caps apply
No limits — you own it
Works offline
No
Yes, fully air-gapped
Migration effort
Change one URL in your code

FAQ

Frequently Asked Questions

Everything you need to know before running AI on your own hardware.

Do I need a GPU to run Foundry Local?

No. Foundry Local runs on CPU, GPU, NPU, and Apple Silicon — it auto-detects the best available hardware. A Copilot+ PC NPU delivers excellent performance for most models. GPU and CPU work fine too.

What models are available?

The Azure AI Foundry catalog includes Phi-4 Mini (3.8B), Phi-4 (14B), Llama 3.2 3B, Llama 3.1 8B, Mistral 7B, DeepSeek-R1, and more — all quantized and hardware-optimized.

Is it really OpenAI-compatible?

Yes. Foundry Local exposes an OpenAI-compatible REST API on localhost:5272. Every SDK, every framework, every tool that talks to OpenAI also talks to Foundry Local — change one URL.

What OS is required?

Windows x64 and ARM64 (including Copilot+ PCs), and macOS. Minimum: 8 GB RAM, 3 GB free disk space. Internet only needed for first-time model downloads.

Can I use it in a CI/CD pipeline or server?

Yes. Foundry Local runs as a background service. Use foundry service start to launch it, then call localhost:5272 from any process on the machine.

How much can I actually save vs GPT-4o?

At 1,000 users doing 10 requests/day of 3,000 tokens: GPT-4o costs ~$4,500/month ($54,000/year). Foundry Local costs $0/month. Use the calculator to model your exact scenario.

Get Started Today

Ready to run AI locally?

Start with the AI evolution story — it sets the stage for everything that follows. Or jump straight to the cost calculator to see your savings.