OpenAI-Compatible API
Writingmate exposes an OpenAI-compatible API under /api/openai/v1, so you can connect your Writingmate workspace to external SDKs, coding tools, and CLI workflows.
That includes:
- OpenAI SDKs
- OpenCode
- Aider
- Simon Willison’s
llmCLI - any other tool that accepts a custom OpenAI base URL and Bearer API key

Create a Writingmate Developer Key from the API Keys tab in Settings.
Before you start
- Open Profile Settings in Writingmate.
- Go to API Keys.
- Create a Writingmate Developer Key.
- Copy the key when it is shown. The full value is displayed only once.
Developer keys look like:
wm_v2.<key-id>.<secret>.<signature>
Base URL
https://writingmate.ai/api/openai/v1
Authentication
Use your Writingmate developer key as a Bearer token:
Authorization: Bearer YOUR_WRITINGMATE_DEVELOPER_KEY
Supported endpoints
Writingmate currently supports these OpenAI-compatible endpoints:
GET /modelsGET /models/{id}POST /chat/completionsPOST /completionsPOST /responsesPOST /images/generationsPOST /videosGET /videos/{id}POST /audio/transcriptions
OpenAPI spec
A full OpenAPI 3.1 spec is published at:
https://writingmate.ai/api/openai/v1/openapi.yaml
Point any OpenAPI tool at that URL — Postman, Insomnia, Swagger UI, Redoc, Stoplight, Scalar, or an SDK generator. For a quick interactive reference you can open it at https://scalar.com/reference?url=https://writingmate.ai/api/openai/v1/openapi.yaml.
Model names
Use Writingmate model slugs exactly as returned by /models, for example:
google/gemini-2.5-flashgoogle/gemini-2.5-proopenai/gpt-5-minianthropic/claude-sonnet-4.5
Fetch the live list from:
curl https://writingmate.ai/api/openai/v1/models \
-H "Authorization: Bearer YOUR_WRITINGMATE_DEVELOPER_KEY"
Production-validated example
The examples below use google/gemini-2.5-flash because that model was validated end to end on production writingmate.ai during launch verification.
How message counting works
The OpenAI-compatible API uses the same counting logic as Writingmate chat.
Each request is charged in messages, not raw tokens:
- 1 message = 16,000 tokens
- every request costs at least 1 message
- cached prompt tokens are discounted by 50% before counting
Exact formula
cached_discount = floor(cached_prompt_tokens * 0.5)
effective_input = max(0, prompt_tokens - cached_discount)
counted_messages = max(1, ceil((effective_input + completion_tokens) / 16000))
What gets counted
For text-generation endpoints, the server counts:
- prompt/input tokens
- completion/output tokens
- cached prompt tokens, discounted by 50%
Example
If a request returns:
prompt_tokens = 10completion_tokens = 5cached_tokens = 0
then:
effective_input = 10
effective_total = 10 + 5 = 15
counted_messages = max(1, ceil(15 / 16000)) = 1
If a request returns:
prompt_tokens = 19000completion_tokens = 3000cached_tokens = 4000
then:
cached_discount = floor(4000 * 0.5) = 2000
effective_input = 19000 - 2000 = 17000
effective_total = 17000 + 3000 = 20000
counted_messages = ceil(20000 / 16000) = 2

OpenAI-compatible API requests use the same token-to-message counting rules as Writingmate chat.
Limits and plan behavior
The OpenAI-compatible API follows the same access rules as the app:
- your current workspace plan controls which models are available
- daily limits and AppSumo pool limits still apply
- usage is recorded in the same
daily_message_countlogic used by Writingmate chat - if you configured your own OpenRouter key, the same BYOK behavior still applies
Workspace selection
By default, the API uses your current workspace.
If your client supports custom headers, you can target a specific workspace with:
x-writingmate-workspace: WORKSPACE_ID
The workspace must belong to the authenticated user.
cURL examples
Chat Completions
curl https://writingmate.ai/api/openai/v1/chat/completions \
-H "Authorization: Bearer YOUR_WRITINGMATE_DEVELOPER_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemini-2.5-flash",
"messages": [
{ "role": "system", "content": "Be concise." },
{ "role": "user", "content": "Summarize Writingmate in one sentence." }
]
}'
Completions
curl https://writingmate.ai/api/openai/v1/completions \
-H "Authorization: Bearer YOUR_WRITINGMATE_DEVELOPER_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemini-2.5-flash",
"prompt": "Reply with exactly: HELLO_FROM_WRITINGMATE"
}'
Responses
curl https://writingmate.ai/api/openai/v1/responses \
-H "Authorization: Bearer YOUR_WRITINGMATE_DEVELOPER_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemini-2.5-flash",
"input": "Reply with exactly: HELLO_FROM_RESPONSES"
}'
Image generation
curl https://writingmate.ai/api/openai/v1/images/generations \
-H "Authorization: Bearer YOUR_WRITINGMATE_DEVELOPER_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "dall-e-3",
"prompt": "A serene mountain lake at sunrise, watercolor style",
"size": "1024x1024"
}'
Response:
{
"created": 1736985600,
"data": [
{ "url": "https://<storage>/generated_images/<user>/<file>.png" }
]
}
Supported providers: OpenAI (dall-e-3, gpt-image-1, gpt-image-1.5), Replicate (black-forest-labs/flux-dev, black-forest-labs/flux-schnell, stability-ai/stable-diffusion-3, recraft-ai/recraft-v3), Google (google/imagen-4-fast, google/imagen-4), and any image-capable OpenRouter model listed in GET /models.
Pass "response_format": "b64_json" to get the image inline as base64 instead of a hosted URL.
Video generation
curl https://writingmate.ai/api/openai/v1/videos \
-H "Authorization: Bearer YOUR_WRITINGMATE_DEVELOPER_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "sora-2",
"prompt": "A golden retriever surfing a small wave at sunset",
"size": "1280x720",
"seconds": "8"
}'
Response:
{
"id": "video_abc123",
"object": "video.generation",
"status": "queued",
"model": "sora-2",
"created": 1736985600
}
Only sora-2 is exposed through the API today. Video generation is asynchronous — poll GET /videos/{id} until status becomes completed or failed. Each call draws seconds seconds from the workspace video pool.
curl https://writingmate.ai/api/openai/v1/videos/video_abc123 \
-H "Authorization: Bearer YOUR_WRITINGMATE_DEVELOPER_KEY"
OpenAI SDK examples
JavaScript / TypeScript
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.WRITINGMATE_API_KEY,
baseURL: "https://writingmate.ai/api/openai/v1",
});
const response = await client.responses.create({
model: "google/gemini-2.5-flash",
input: "Say hello from Writingmate.",
});
console.log(response.output_text);
Python
from openai import OpenAI
client = OpenAI(
api_key="YOUR_WRITINGMATE_DEVELOPER_KEY",
base_url="https://writingmate.ai/api/openai/v1",
)
response = client.chat.completions.create(
model="google/gemini-2.5-flash",
messages=[
{"role": "user", "content": "Reply with exactly: HELLO_FROM_PYTHON"}
],
)
print(response.choices[0].message.content)
OpenCode
OpenCode supports custom providers through config.json.
Use a provider like this:
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"writingmate": {
"npm": "@ai-sdk/openai-compatible",
"name": "Writingmate",
"options": {
"baseURL": "https://writingmate.ai/api/openai/v1",
"apiKey": "{env:OPENAI_API_KEY}"
},
"models": {
"google/gemini-2.5-flash": {},
"openai/gpt-5-mini": {}
}
}
}
}
Environment:
export OPENAI_API_KEY=YOUR_WRITINGMATE_DEVELOPER_KEY

OpenCode can use Writingmate as an OpenAI-compatible provider with a custom base URL and developer key.
Aider
Aider can target any OpenAI-compatible endpoint.
export OPENAI_API_KEY=YOUR_WRITINGMATE_DEVELOPER_KEY
export OPENAI_API_BASE=https://writingmate.ai/api/openai/v1
aider --model openai/google/gemini-2.5-flash
Important: Aider uses the openai/ prefix for OpenAI-compatible endpoints, so the Writingmate model slug becomes:
openai/google/gemini-2.5-flash
llm CLI
Simon Willison’s llm CLI can add OpenAI-compatible models using extra-openai-models.yaml.
First, store the key:
llm keys set openai
Then add a model definition to extra-openai-models.yaml:
- model_id: writingmate-gemini-flash
model_name: google/gemini-2.5-flash
api_base: "https://writingmate.ai/api/openai/v1"
api_key_name: openai
Then run:
llm -m writingmate-gemini-flash "Reply with exactly: HELLO_FROM_LLM"
Tool calling
/chat/completions and /responses support OpenAI-style function tools for single-step tool calling.
Notes
- This API is OpenAI-compatible, not byte-for-byte identical to every OpenAI feature.
- Use
/modelsto discover the currently supported live model list. - For transcription-specific details, see the dedicated transcription documentation.