Ollama API Reference
Ollama provides a comprehensive HTTP API that allows you to integrate LLMs into your applications. This API reference documents the available endpoints, parameters, and response formats.
API Base URL
The Ollama API is available at:
http://localhost:11434/api
Authentication
By default, the Ollama API does not require authentication when accessed locally. For remote access, you should set up appropriate authentication mechanisms (e.g., reverse proxy with authentication, firewall rules, etc.).
Common API Patterns
- All API requests use HTTP POST with JSON request bodies (except for the
GET /api/tagsendpoint) - All API responses are in JSON format
- Many endpoints support streaming responses (with
stream: trueparameter) - Error responses include an
errorfield with a message
Chat Endpoint
POST /api/chat
Start a chat session with a model.
Request
{
"model": "llama2",
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
],
"stream": true,
"options": {
"temperature": 0.7,
"top_p": 0.9,
"top_k": 40
}
}
Parameters:
model(string, required): Name of the model to usemessages(array, required): List of messages in the conversationrole(string): Either “user” or “assistant”content(string): The message content
stream(boolean, optional): Whether to stream the response (default: false)options(object, optional): Additional parameters for generationtemperature(float): Controls randomness (0.0-1.0)top_p(float): Controls diversity via nucleus sampling (0.0-1.0)top_k(integer): Controls diversity by limiting to top K tokensnum_predict(integer): Maximum number of tokens to generatesystem(string): System prompt to include
Response (non-streaming)
{
"model": "llama2",
"message": {
"role": "assistant",
"content": "I'm doing well, thank you for asking! How can I help you today?"
},
"done": true
}
Response (streaming)
When streaming is enabled, the API sends a series of JSON objects, one per line:
{"model":"llama2","message":{"role":"assistant","content":"I"}}
{"message":{"content":"'m"}}
{"message":{"content":" doing"}}
{"message":{"content":" well"}}
{"message":{"content":","}}
...
{"done":true}
Generate Endpoint
POST /api/generate
Generate a completion for a prompt.
Request
{
"model": "llama2",
"prompt": "What is the capital of France?",
"stream": true,
"options": {
"temperature": 0.7,
"num_predict": 100
}
}
Parameters:
model(string, required): Name of the model to useprompt(string, required): The prompt to generate a response forsystem(string, optional): System prompt to includetemplate(string, optional): Custom prompt templatecontext(array of integers, optional): Previous context to includestream(boolean, optional): Whether to stream the response (default: false)raw(boolean, optional): Whether to return raw, unprocessed model outputoptions(object, optional): Additional parameters for generation- See options under the chat endpoint
Response (non-streaming)
{
"model": "llama2",
"response": "The capital of France is Paris.",
"done": true
}
Embeddings Endpoint
POST /api/embeddings
Generate embeddings for a given input.
Request
{
"model": "llama2",
"prompt": "The quick brown fox jumps over the lazy dog"
}
Parameters:
model(string, required): Name of the model to useprompt(string, required): The text to generate an embedding for
Response
{
"embedding": [0.5, 0.2, 0.9, ...]
}
Model Management Endpoints
GET /api/tags
List available models.
Response
{
"models": [
{
"name": "llama2",
"model": "llama2:7b",
"modified_at": "2023-08-02T15:27:20.162385Z",
"size": 3791730596,
"digest": "..."
},
...
]
}
POST /api/pull
Download a model from the Ollama library.
Request
{
"name": "llama2",
"insecure": false
}
Parameters:
name(string, required): Name of the model to pull (e.g., “llama2:7b”)insecure(boolean, optional): Allow insecure connections (default: false)
Example API Usage
Python Example
import requests
import json
# Chat example
def chat_with_model():
url = "http://localhost:11434/api/chat"
payload = {
"model": "llama2",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
]
}
response = requests.post(url, json=payload)
return response.json()
# Streaming chat example
def stream_chat():
url = "http://localhost:11434/api/chat"
payload = {
"model": "llama2",
"messages": [
{"role": "user", "content": "Write a short poem about programming."}
],
"stream": True
}
response = requests.post(url, json=payload, stream=True)
for line in response.iter_lines():
if line:
json_response = json.loads(line)
if 'message' in json_response and 'content' in json_response['message']:
print(json_response['message']['content'], end='')
# List models
def list_models():
url = "http://localhost:11434/api/tags"
response = requests.get(url)
return response.json()