A coding agent does not spend most of its time writing the final patch.
It spends time finding the right file, opening code, following symbols, rerunning searches, reading test output, and carrying all of that material into later turns. The cost of a task is therefore shaped by the path to the answer, not just the answer itself.
Atelier saves money by shortening that path and reducing what enters the model’s context along the way.
It does not use a cheaper model behind your back. It improves the tools around the model you chose.
Every unnecessary result stays in the conversation
In a typical agent loop, a tool result becomes part of the conversation. The next model call includes that result along with earlier messages and newer work.
A 2,000-token file read is not necessarily paid for only once. It occupies space in the active context until the session is compacted or cleared. The same is true for a broad search result, a long build log, or a file that gets opened twice.
This makes early context decisions important. Saving tokens before the agent has found the answer also reduces the history carried through the rest of the task.
Atelier attacks that cost in four places:
- discovery,
- source reading,
- tool output,
- session lifecycle.
1. Find the relevant code with fewer calls
Without code intelligence, repository exploration becomes a sequence of guesses: search for a term, open one match, find another symbol, search again, and continue until the call path is clear.
Atelier’s search and explore tools combine work that would otherwise require several turns.
Ranked search returns relevant snippets within a fixed token budget. Symbol mode locates definitions without returning entire source bodies. Repository-map mode expands around a small set of seed files.
explore can group source with callers, callees, and usages in one response. When the symbol is already known, it can request only one relation rather than paying for the broader conceptual view.
The saving is not “search is magically free.” The saving is avoided interaction: fewer separate searches, fewer speculative file reads, and fewer model turns spent assembling a call graph manually.
2. Read the shape before reading the body
A full file is often the wrong unit of context.
In mcp_server.py, Atelier’s read tool defaults to an outline for code files over 200 lines. The outline contains the file’s structure and line locations. The agent can then request the exact range it needs.
For example, investigating one handler in a 1,000-line module can become:
read module outline
read lines 420-510
instead of:
read all 1,000 lines
For supported languages, source projection can reduce a full view further by removing comments and blank-line noise while retaining a mapping to the original source. When the agent requests a line range, Atelier returns an exact slice without the outline and projection metadata that would be redundant.
Batch reads also matter. If three independent files are required, the tool can read them in one call. That avoids additional agent turns whose prompts would each include the conversation history again.
The implementation tracks estimated tokens saved by outlines and projections, but it does not credit the same full-file baseline repeatedly. Once a file’s avoided full read has been counted in a session, later reads do not claim the saving again.
3. Put a budget on search and command output
Search tools can produce more text than the model can use.
Atelier’s grep and search surfaces both expose output budgets. The default search budget is 2,000 tokens. Broad matches are ranked, summarized, or returned as file pointers rather than as an unbounded wall of source.
Large command output gets a different treatment. Shell, SQL, and web results have strict inline limits. When a result is too large, Atelier stores the complete output and returns a head-and-tail summary with a retrieval reference.
This is cheaper than inserting a 100,000-character log into the conversation, but it does not destroy the evidence. If the failure is in the omitted middle, the agent can retrieve the relevant slice without rerunning the command.
The final JSON-RPC writer also enforces a hard frame limit. That guard is primarily about reliability, but it prevents an oversized result from killing the MCP connection and forcing the session to restart.
4. Do not send the same content twice
Atelier keeps a within-session registry for read-style results.
If a tool produces content that is byte-identical to something already returned in the current context epoch, the dispatcher can replace it with a small pointer. For file reads, it can also return a delta when the resource has changed rather than repeating the entire body.
The agent can explicitly force a fresh full result when necessary. Edits and compaction reset the relevant state so a pointer cannot silently refer to stale context forever.
This mechanism addresses a common agent behavior: reopening a file “to be safe” even though the exact content is already in the prompt.
A repeated read is not made faster by prompt caching if the model still has to process and reason over the duplicate result. The cheapest duplicate is the one not emitted.
5. Compact at a useful boundary
Eventually, a long session fills its context window. Compacting too early loses useful detail; compacting too late makes every subsequent turn expensive and risks exhausting the window.
Atelier tracks context use in its run ledger. The current implementation begins advising at 60 percent utilization, considers compaction at 80 percent, and prepares a handover at 95 percent.
Automatic compaction is gated by more than a percentage. The runtime looks for enough turns or unusually high utilization and checks for a structured task boundary, such as a passing test or successful command. A model merely saying “done” in free text does not count.
The compacted state preserves recent turns, active errors, recently touched files, pinned memory, and active playbooks. At the handover threshold, Atelier writes a packet that a fresh session can read.
The goal is not the smallest possible context. It is the smallest context that still lets the agent continue correctly.
Savings accounting should be conservative
Token-savings claims are easy to inflate, so the adapter contains explicit corrections.
An errored read receives no savings credit because it did not provide usable context. An outline and a later range from the same file cannot both claim that they independently avoided the entire full file. Code-intelligence credit is deferred: Atelier records the files surfaced by search or exploration, waits through an observation window, and cancels the credit if the agent reads those files anyway.
Accounting runs outside the model-facing result. It should measure behavior without adding more prose for the model to process.
This is a better way to evaluate context tooling: count what was actually avoided, not what could theoretically have been avoided.
Where the saving comes from
Atelier’s cost reduction is the sum of several ordinary decisions:
| Source of waste | Atelier’s response |
|---|---|
| Repeated discovery calls | Ranked search and grouped code intelligence |
| Reading whole files | Outlines, exact ranges, and source projection |
| Separate calls for independent files | Batched reads |
| Unbounded search results | Per-call token budgets |
| Huge logs in context | Recoverable spill summaries |
| Duplicate reads | Content pointers and deltas |
| Overgrown sessions | Boundary-aware compaction and handover |
| Inflated savings reports | Per-file and per-session credit deduplication |
The exact saving depends on the repository and the task. A small change in a tiny project may need only one search and one edit. A cross-cutting change in a large monorepo has much more redundant discovery and context to remove.
That is why the meaningful comparison uses the same model, the same task, and the same repository. Atelier does not make tokens cheaper. It helps the agent need fewer of them to reach a verified result.
Evidence and methodology
Atelier publishes its benchmark setup, raw reports, and implementation so the savings claims can be checked rather than taken on trust. Results vary by repository size, task, model, and host; the benchmark documentation separates gross context savings from net end-to-end cost.