The expanding toolkit

TLDR;

Lucas from Anthropic discusses the evolution of the Claude model from a simple input-output system to an expanding toolkit that simplifies agent building. The talk highlights how tasks that previously required extensive scaffolding, such as tool use, context management, code execution, and computer use, are now streamlined and integrated into the model itself. The focus shifts from compensating for model unreliability to leveraging the model's capabilities to connect it with unique user data and tools.

Model as an expanding toolkit, not just input-output.
Scaffolding that was previously required now ships with the model.
Focus on connecting the model to unique user data and tools.

Introduction [0:15]

Lucas introduces the concept of Claude as an expanding toolkit, moving beyond a simple input-output model. The core idea is that the scaffolding developers had to build around the model is now integrated into the model itself. This shift simplifies development, allowing users to focus on outcomes rather than managing retries and wrappers. The presentation will cover tool use, context management, code execution, and computer use, with practical tips for each.

Tool Use: Routing and Retries [3:04]

Previously, trusting the model with a full tool set was problematic due to context window limitations, necessitating the creation of routers using string matching and heuristics. These routers were brittle and prone to failure when new tools were added, also retry decorators were needed because tools failed often. Now, the model can intelligently select the right tool itself, eliminating the need for pre-filtering and manual routing. Claude can also recover from tool errors and retry automatically.

Tool Use: Practical Tips [4:45]

When providing a tool to Claude, include a description of the output schema. This allows Claude to understand what to expect from the tool, enabling more efficient and intelligent outputs. Also, pre- and post-tool use hooks can be defined in Claude settings to programmatically block certain tool calls or analyze and log outputs.

Context Management [6:45]

Managing long-running agents used to require building custom memory systems with chunking, RAG (Retrieval-Augmented Generation), and summarization techniques to extend the model's context window. Now, with 1 million context length and flat pricing, window pressure is reduced. Server-side compaction and context editing further simplify context management, making it feel closer to an infinite context window.

Context Management: Practical Tips [8:23]

It's recommended to clear tool results after each turn to save on context. Pruning stale tool outputs like screenshots, search results, and file reads can significantly reduce token usage while preserving the decisions informed by those results. In Claude Code, the {slash}context command provides a live, colored grid breakdown of what's filling the context window, along with optimization suggestions.

Code Execution [9:58]

The write, run, and fix loop for code execution used to be the developer's responsibility, involving VM providers, sandboxes, and manual parsing of feedback and tracebacks. Now, Claude offers a code execution tool with a hosted sandbox, allowing the entire loop to occur within a single API turn. Claude has its own computer for stateless compute and data analysis without disrupting the local file system. It can intelligently switch between its sandbox and the user's local bash when necessary.

Code Execution: Practical Tips [11:12]

Use /schedule in Claude Code to schedule cron-triggered autonomous runs, enabling self-iteration loops on a timer, completely autonomously done by Claude.

Computer Use [12:35]

Previously, using Claude to drive a laptop required significant image glue to handle downscaling and scaling back up screenshots for reliable clicks. Opus 47 now supports native resolution screenshots up to 1440p with one-to-one pixel coordinates, eliminating the need for scaling math. Claude's performance on the OS World evaluation has significantly improved, nearing 80%, indicating its growing capability in completing complex tasks on professional and consumer software.

Computer Use: Practical Tips [14:49]

Experiment with different resolutions and image formats (JPEG, PNG, WebP) to find what works best for specific use cases and UIs. Claude Code can leverage your Chrome browser session via the Claude in Chrome extension, allowing the agent's harness to navigate the web, including local development environments.

Demo: Agentic Coding Loop with Claude Code [16:03]

A pre-recorded demo showcases an agentic coding loop with Claude Code, using computer use and the Claude in Chrome extension. Claude identifies and fixes bugs in a project management dashboard by testing the live board, making code changes, and retesting the flow. This includes actions like typing, dragging cards, and diagnosing drag-and-drop issues.

Conclusion [19:33]

Code written to compensate for model unreliability has a short lifespan. Anthropic will continue to enhance Claude's reliability and capabilities through its expanding toolkit. Focus on code that connects the model to unique user data, custom tools, and specific context, as this is more valuable than compensating for model shortcomings. The future involves every agent and piece of software having a front door for agents, with the interesting work being what you put on the other side that nobody else can.