On Tuesday I did a talk to our internal AI group, along with talks by excellent colleagues, on Context Engineering. The very next day, Anthropic released a blog post on Effective harnesses for long-running agents and updated their prompting best practices guide.
Note to Anthropic: please can you release relevant material the day before I do a talk rather than afterwards!
I talked about managing context size, how LLM accuracy declines as the context window fills past 50% and my mental model of LLMs as amnesiac pedants, and how this drives Claude Code features like it’s heavy use of todo lists, summarising context into markdown files etc.
In the new posts, Anthropic suggest a step beyond this, starting with
Use a different prompt for the very first context window: Use the first context window to set up a framework (write tests, create setup scripts), then use future context windows to iterate on a todo-list
This is fascinating. It’s like the next step beyond asking Claude to plan things out first because of the emphasis on creating durable knowledge via tools. They then go further:
Prefer starting afresh over compacting
So it makes using a coding agent even more like managing a team, just a very forgetful one, and with modern development best practice embedded via code.
The full method is
- Have claude create scripts to run tools (linters, tests) so that this knowledge is preserved
- Give Claude ways to check correctness (tests, playwright etc)
- Use json to track status etc.
- Use text/markdown for progress notes and
- Use git to commit work at the end of context windows
- Clear context and re-start
Since Claude is good at reading code, it can infer the working methods from the scripts that are created.
The json bit is a wonderful aside: the Anthropic team ran experiments to discover that the model is less likely to re-write json compared to markdown files. It’s another emergent behaviour (and I guess this is a side effect of it thinking of json as ‘code’)
The new 4.5 models are able to see their own context, so they also suggest telling Claude explicitly to manage context via text like this in prompts:
This is a very long task, so it may be beneficial to plan out your work clearly. It’s encouraged to spend your entire output context working on the task - just make sure you don’t run out of context with significant uncommitted work. Continue working systematically until you have completed this task
Honestly, this whole forgetting what you are working on after a period is something I can relate to after a long day!
–
Cover image by Steve Cadman