Message Limits

Explain Like I'm Five

Think of messages like cups of water from a big jug. Each day, you get a jug with a certain number of cups.

When you chat with the AI, you pour some water. A short question? Just a tiny splash. A long conversation with a big document? That's like pouring several cups at once.

The longer your chat gets, the more water each new message needs, because the AI has to remember everything you've already said.

Starting a new chat is like using a fresh cup: you only pay for what is in that request, not everything from the old conversation.


What is a "message"?

One counted message equals 16,000 tokens.

That is roughly:

  • about 12,000 words
  • about 24 pages of plain text

How usage is calculated

For every request, Writingmate counts:

  1. your prompt/input tokens
  2. the model’s response/output tokens
  3. cached prompt tokens, discounted by 50%

This applies to:

  • normal Writingmate chat
  • OpenAI-compatible API requests made with Writingmate Developer Keys

Exact counting formula

cached_discount = floor(cached_prompt_tokens * 0.5)
effective_input = max(0, prompt_tokens - cached_discount)
counted_messages = max(1, ceil((effective_input + completion_tokens) / 16000))

Minimum usage rule

Every interaction consumes at least 1 message, even if it is tiny.

Examples:

  • less than 16,000 effective tokens = 1 message
  • 25,000 effective tokens = 2 messages
  • 48,000 effective tokens = 3 messages

Example

Small request

If a request returns:

  • prompt_tokens = 10
  • completion_tokens = 5
  • cached_tokens = 0

then:

effective_total = 10 + 5 = 15
counted_messages = 1

Larger request with cache discount

If a request returns:

  • prompt_tokens = 19000
  • completion_tokens = 3000
  • cached_tokens = 4000

then:

cached_discount = floor(4000 * 0.5) = 2000
effective_input = 19000 - 2000 = 17000
effective_total = 17000 + 3000 = 20000
counted_messages = ceil(20000 / 16000) = 2
Token-to-message counting diagram for Writingmate and the OpenAI-compatible API

The same counting rule is used in Writingmate chat and the OpenAI-compatible API.

How this affects plans

Your plan controls:

  • which model categories you can access
  • how many counted messages you can use each day or month
  • whether AppSumo pool limits apply

Text endpoints of the OpenAI-compatible API (/chat/completions, /completions, /responses, /audio/transcriptions) do not use a separate quota. They use the same message counter as Writingmate chat.

Image and video billing

The image and video endpoints of the OpenAI-compatible API are not counted in messages. They draw from dedicated pools:

  • POST /images/generations consumes 1 image credit per successful generation from your workspace's image pool.
  • POST /videos/generations consumes the requested seconds value from your workspace's video-seconds pool and is additionally capped by your plan's monthly video count.

On AppSumo plans the image and video pools are the lifetime credits bundled with your tier. On other plans they follow the plan's per-period caps shown in Settings.

Tips for using messages efficiently

  • Start new chats often when history is no longer useful
  • Limit large pasted context to only the relevant sections
  • Use lower-cost models for repetitive or high-volume tasks
  • Watch cached-token behavior in long iterative workflows

Unlimited or BYOK behavior

If you connect your own OpenRouter API key, Writingmate can use that provider key path for supported requests. In that case, provider billing and limit behavior may differ from standard Writingmate-managed usage.