Timeouts, tokens and context cost

How model limits work: what tokens are, why a response can be cut off or time out, how a large context affects cost and speed, and what to do about it.

1Tokens and context in plain words

Models count and bill work in “tokens” (roughly parts of words; Russian text usually takes more tokens than the same English text). Every request includes: the system instruction, the project context (description, tone of voice, audience, taboos, marketing and extra context, the text of context files), the material/topic itself, and then the model's answer. The larger the input context and the longer the answer, the more tokens — and the more expensive and slower the request.

2Request timeout

If the model doesn't reply within the allotted time, the request is aborted and the “Model requests” log shows “Error: The operation was aborted due to timeout”. This is not the model's answer but a guard against hangs. The default limit is 45 seconds; an administrator can change it via the LLM_TIMEOUT_MS environment variable. Common causes: a slow or “reasoning” model, a very large prompt, provider overload, or network issues reaching the API.

3Answer length limit (cut off mid-sentence)

The model's answer has a length cap in tokens (max_tokens). If the answer doesn't fit, it is cut off — in the log you'll see the text ending mid-word. For ordinary posts the cap is enough; for long structured answers (e.g. “Improve project” or website import) the service requests a higher cap. If you still see truncation — reduce context or pick a model with a larger answer limit.

4Large context = more cost and slower

Note: everything in the project context is sent to the model on EVERY generation. Long “Extra context” and “Marketing context”, as well as uploaded files (.md/.pdf, up to ~20,000 characters per file), enlarge each request — this raises token spend (money) and response time and brings the timeout closer. Several large files per project can noticeably increase the cost of regular generation.

5How to resolve it

1) Reduce context: remove unnecessary or bulky context files, trim “Extra”/“Marketing context” to the essentials. 2) Pick a faster/lighter model (e.g. a lite mode) in project settings. 3) If answers are genuinely long — ask an administrator to raise LLM_TIMEOUT_MS. 4) Make sure a working key for the selected model is set (Profile → keys) and the API is reachable. 5) Retry later if the provider was overloaded.

6Controlling token spend

To keep spend predictable: set “Generation limits” (drafts per 24h and per 7 days) in project settings — generation pauses when a limit is reached. Watch the “Model requests” section: it shows every real call with request and response — handy to gauge volume and spot bloated prompts. Keep the context concise: prompt quality matters more than length.

Didn't find an answer? Write to us — we'll help.

Contact support