💻 Chapter 05 · CLI Reference

Foundry Local CLI

Complete command reference for the foundry CLI — manage models, control the service, and maintain your local cache.

Installation

Foundry Local runs on Windows (x64 / ARM64) and macOS. Minimum requirements: 8 GB RAM, 3 GB free disk space, internet connection for first-time model downloads.

# Windows — via winget
winget install Microsoft.FoundryLocal

# macOS — via Homebrew
brew tap microsoft/foundry
brew install foundry-local

Verify Installation

foundry --version
Foundry Local 1.x.x

Model Commands

All model commands follow the pattern foundry model <command> [model-alias]. Model aliases are short, memorable names like phi-4-mini.

foundry model run

Downloads the model if not already cached, loads it into the service, and starts an interactive chat session in the terminal.

foundry model run phi-4-mini
✓ Downloading phi-4-mini (INT4, 2.5GB)...
✓ Optimizing for your hardware (NPU detected)...
✓ Server ready on localhost:5272

You > Hello, what can you do?
Phi-4 Mini > I can help with writing, coding, analysis...
💡

The model stays running as a background service after model run. Your applications can call localhost:5272 without restarting the model each time.

foundry model list

Lists all available models in the catalog — both downloaded and available for download.

foundry model list

ALIAS                   SIZE    STATUS      HARDWARE
phi-4-mini              2.5 GB  downloaded  NPU/CPU
phi-4                   8.5 GB  available   GPU/CPU
llama-3.2-3b            2.0 GB  available   CPU/GPU
llama-3.1-8b            5.0 GB  available   GPU
mistral-7b              4.5 GB  available   GPU
deepseek-r1-7b          5.0 GB  available   GPU

foundry model download

Downloads a model to the local cache without starting the service. Useful for pre-caching models before going offline.

foundry model download phi-4-mini
Downloading phi-4-mini... [████████████████████] 100%
✓ Saved to ~/.foundry/models/phi-4-mini

foundry model info

Shows detailed information about a specific model: full name, quantization type, hardware compatibility, license, and source.

foundry model info phi-4-mini

Alias:          phi-4-mini
Full name:      phi-4-mini-instruct-cuda-int4-rtn-block-32-acc-level-4
Parameters:     3.8B
Quantization:   INT4 (RTN, block-32)
Size:           2.5 GB
Hardware:       NPU, NVIDIA CUDA, CPU
License:        MIT
Source:         Azure AI Foundry catalog

foundry model load

Loads an already-downloaded model into the inference service without starting an interactive session.

foundry model load phi-4-mini
✓ phi-4-mini loaded · API available at localhost:5272

foundry model unload

Removes a model from memory, freeing hardware resources for other workloads.

foundry model unload phi-4-mini
✓ phi-4-mini unloaded · 2.5 GB freed

Service Commands

Manage the Foundry Local background service directly.

# Start the service (without loading a model)
foundry service start

# Check service status
foundry service status
✓ Running · localhost:5272 · phi-4-mini loaded

# Stop the service
foundry service stop

# Show service logs
foundry service logs

Using the API

Once a model is loaded, call it using any OpenAI-compatible client. The endpoint is http://localhost:5272/v1.

# Python — openai SDK
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:5272/v1",
    api_key="not-needed"   # any string works
)

response = client.chat.completions.create(
    model="phi-4-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
# curl — test from any terminal
curl http://localhost:5272/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "phi-4-mini",
    "messages": [{"role": "user", "content": "What is 2+2?"}]
  }'

Quick Reference

CommandWhat it does
foundry model run <alias>Download (if needed) + load + interactive chat
foundry model listShow all catalog models and download status
foundry model download <alias>Download model to local cache only
foundry model info <alias>Show model details, hardware, license
foundry model load <alias>Load cached model into service
foundry model unload <alias>Remove model from memory
foundry service start/stop/statusControl the background service
foundry service logsView service output and errors