Bringing Engineering Discipline to Prompts—Part 2 …


The following is Part 2 of 3 from Addy Osmani’s original post “Context Engineering: Bringing Engineering Discipline to Parts.” Part 1 can be found here.

Great context engineering strikes a balance—include everything the model truly needs but avoid irrelevant or excessive detail that could distract it (and drive up cost).

As Andrej Karpathy described, context engineering is a delicate mix of science and art.

The “science” part involves following certain principles and techniques to systematically improve performance. For example, if you’re doing code generation, it’s almost scientific that you should include relevant code and error messages; if you’re doing question-answering, it’s logical to retrieve supporting documents and provide them to the model. There are established methods like few-shot prompting, retrieval-augmented generation (RAG), and chain-of-thought prompting that we know (from research and trial) can boost results. There’s also a science to respecting the model’s constraints—every model has a context length limit, and overstuffing that window can not only increase latency/cost but potentially degrade the quality if the important pieces get lost in the noise.

Karpathy summed it up well: “Too little or of the wrong form and the LLM doesn’t have the right context for optimal performance. Too much or too irrelevant and the LLM costs might go up and performance might come down.”

So the science is in techniques for selecting, pruning, and formatting context optimally. For instance, using embeddings to find the most relevant docs to include (so you’re not inserting unrelated text) or compressing long histories into summaries. Researchers have even catalogued failure modes of long contexts—things like context poisoning (where an earlier hallucination in the context leads to further errors) or context distraction (where too much extraneous detail causes the model to lose focus). Knowing these pitfalls, a good engineer will curate the context carefully.

Then there’s the “art” side—the intuition and creativity born of experience.

This is about understanding LLMs’ quirks and subtle behaviors. Think of it like a seasoned programmer who “just knows” how to structure code for readability: An experienced context engineer develops a feel for how to structure a prompt for a given model. For example, you might sense that one model tends to do better if you first outline a solution approach before diving into specifics, so you include an initial step like “Let’s think step by step…” in the prompt. Or you notice that the model often misunderstands a particular term in your domain, so you preemptively clarify it in the context. These aren’t in a manual—you learn them by observing model outputs and iterating. This is where prompt-crafting (in the old sense) still matters, but now it’s in service of the larger context. It’s similar to software design patterns: There’s science in understanding common solutions but art in knowing when and how to apply them.

Let’s explore a few common strategies and patterns context engineers use to craft effective contexts:

Retrieval of relevant knowledge: One of the most powerful techniques is retrieval-augmented generation. If the model needs facts or domain-specific data that isn’t guaranteed to be in its training memory, have your system fetch that info and include it. For example, if you’re building a documentation assistant, you might vector-search your documentation and insert the top matching passages into the prompt before asking the question. This way, the model’s answer will be grounded in real data you provided rather than in its sometimes outdated internal knowledge. Key skills here include designing good search queries or embedding spaces to get the right snippet and formatting the inserted text clearly (with citations or quotes) so the model knows to use it. When LLMs “hallucinate” facts, it’s often because we failed to provide the actual fact—retrieval is the antidote to that.

Few-shot examples and role instructions: This hearkens back to classic prompt engineering. If you want the model to output something in a particular style or format, show it examples. For instance, to get structured JSON output, you might include a couple of example inputs and outputs in JSON in the prompt, then ask for a new one. Few-shot context effectively teaches the model by example. Likewise, setting a system role or persona can guide tone and behavior (“You are an expert Python developer helping a user…”). These techniques are staples because they work: They bias the model toward the patterns you want. In the context-engineering mindset, prompt wording and examples are just one part of the context, but they remain crucial. In fact, you could say prompt engineering (crafting instructions and examples) is now a subset of context engineering—it’s one tool in the toolkit. We still care a lot about phrasing and demonstrative examples, but we’re also doing all these other things around them.

Managing state and memory: Many applications involve multiple turns of interaction or long-running sessions. The context window isn’t infinite, so a major part of context engineering is deciding how to handle conversation history or intermediate results. A common technique is summary compression—after each few interactions, summarize them and use the summary going forward instead of the full text. For example, Anthropic’s Claude assistant automatically does this when conversations get lengthy, to avoid context overflow. (You’ll see it produce a “[Summary of previous discussion]” that condenses earlier turns.) Another tactic is to explicitly write important facts to an external store (a file, database, etc.) and then later retrieve them when needed rather than carrying them in every prompt. This is like an external memory. Some advanced agent frameworks even let the LLM generate “notes to self” that get stored and can be recalled in future steps. The art here is figuring out what to keep, when to summarize, and how to resurface past info at the right moment. Done well, it lets an AI maintain coherence over very long tasks—something that pure prompting would struggle with.

Tool use and environmental context: Modern AI agents can use tools (e.g., calling APIs, running code, web browsing) as part of their operations. When they do, each tool’s output becomes new context for the next model call. Context engineering in this scenario means instructing the model when and how to use tools and then feeding the results back in. For example, an agent might have a rule: “If the user asks a math question, call the calculator tool.” After using it, the result (say 42) is inserted into the prompt: “Tool output: 42.” This requires formatting the tool output clearly and maybe adding a follow-up instruction like “Given this result, now answer the user’s question.” A lot of work in agent frameworks (LangChain, etc.) is essentially context engineering around tool use—giving the model a list of available tools, along with syntactic guidelines for invoking them, and templating how to incorporate results. The key is that you, the engineer, orchestrate this dialogue between the model and the external world.

Information formatting and packaging: We’ve touched on this, but it deserves emphasis. Often you have more info than fits or is useful to include fully. So you compress or format it. If your model is writing code and you have a large codebase, you might include just function signatures or docstrings rather than entire files, to give it context. If the user query is verbose, you might highlight the main question at the end to focus the model. Use headings, code blocks, tables—whatever structure best communicates the data. For example, rather than “User data: [massive JSON]… Now answer question.” you might extract the few fields needed and present “User’s Name: X, Account Created: Y, Last Login: Z.” This is easier for the model to parse and also uses fewer tokens. In short, think like a UX designer, but your “user” is the LLM—design the prompt for its consumption.

The impact of these techniques is huge. When you see an impressive LLM demo solving a complex task (say, debugging code or planning a multistep process), you can bet it wasn’t just a single clever prompt behind the scenes. There was a pipeline of context assembly enabling it.

For instance, an AI pair programmer might implement a workflow like:

  1. Search the codebase for relevant code.
  2. Include those code snippets in the prompt with the user’s request.
  3. If the model proposes a fix, run tests in the background.
  4. If tests fail, feed the failure output back into the prompt for the model to refine its solution.
  5. Loop until tests pass.

Each step has carefully engineered context: The search results, the test outputs, etc., are each fed into the model in a controlled way. It’s a far cry from “just prompt an LLM to fix my bug” and hoping for the best.

The Challenge of Context Rot

As we get better at assembling rich context, we run into a new problem: Context can actually poison itself over time. This phenomenon, aptly termed “context rot” by developer Workaccount2 on Hacker News, describes how context quality degrades as conversations grow longer and accumulate distractions, dead-ends, and low-quality information.

The pattern is frustratingly common: You start a session with a well-crafted context and clear instructions. The AI performs beautifully at first. But as the conversation continues—especially if there are false starts, debugging attempts, or exploratory rabbit holes—the context window fills with increasingly noisy information. The model’s responses gradually become less accurate and more confused, or it starts hallucinating.

The challenge of context rot

Why does this happen? Context windows aren’t just storage—they’re the model’s working memory. When that memory gets cluttered with failed attempts, contradictory information, or tangential discussions, it’s like trying to work at a desk covered in old drafts and unrelated papers. The model struggles to identify what’s currently relevant versus what’s historical noise. Earlier mistakes in the conversation can compound, creating a feedback loop where the model references its own poor outputs and spirals further off track.

This is especially problematic in iterative workflows—exactly the kind of complex tasks where context engineering shines. Debugging sessions, code refactoring, document editing, or research projects naturally involve false starts and course corrections. But each failed attempt leaves traces in the context that can interfere with subsequent reasoning.

Practical strategies for managing context rot include:

  • Context pruning and refresh: Workaccount2’s solution is “I work around it by regularly making summaries of instances, and then spinning up a new instance with fresh context and feed in the summary of the previous instance.” This approach preserves the essential state while discarding the noise. You’re essentially doing garbage collection for your context.
  • Structured context boundaries: Use clear markers to separate different phases of work. For example, explicitly mark sections as “Previous attempts (for reference only)” versus “Current working context.” This helps the model understand what to prioritize.
  • Progressive context refinement: After significant progress, consciously rebuild the context from scratch. Extract the key decisions, successful approaches, and current state, then start fresh. It’s like refactoring code—occasionally you need to clean up the accumulated cruft.
  • Checkpoint summaries: At regular intervals, have the model summarize what’s been accomplished and what the current state is. Use these summaries as seeds for fresh context when starting new sessions.
  • Context windowing: For very long tasks, break them into phases with natural boundaries where you can reset context. Each phase gets a clean start with only the essential carry-over from the previous phase.

This challenge also highlights why “just dump everything into the context” isn’t a viable long-term strategy. Like good software architecture, good context engineering requires intentional information management—deciding not just what to include but also when to exclude, summarize, or refresh.


AI tools are quickly moving beyond chat UX to sophisticated agent interactions. Our upcoming AI Codecon event, Coding for the Agentic World, will highlight how developers are already using agents to build innovative and effective AI-powered experiences. We hope you’ll join us on September 9 to explore the tools, workflows, and architectures defining the next era of programming. It’s free to attend. Register now to save your seat.



Source link

We will be happy to hear your thoughts

Leave a reply

Carts View
Logo
Shopping cart