Computer Use Agent
An AI agent that controls a computer by viewing the screen, moving the mouse, clicking elements, and typing keystrokes, effectively operating software like a human user. Computer use agents interact with any application through the visual interface.
Computer use agents represent a breakthrough in agent flexibility. Instead of requiring API integrations for every application, these agents interact with software through the same visual interface humans use. The agent sees a screenshot of the screen, decides what action to take (click a button, type in a field, scroll), executes the action, and observes the new screen state. This enables automation of any desktop or web application without custom integrations.
For growth and operations teams, computer use agents can automate workflows across applications that lack APIs or integrations. Tasks like data entry between legacy systems, navigating complex admin interfaces, or performing multi-application workflows become automatable. Anthropic's computer use capability and similar offerings from other providers are making this increasingly accessible. The tradeoffs are speed (visual interaction is slower than API calls), reliability (UI changes can break workflows), and cost (screenshot processing is token-intensive). Use computer use agents for low-frequency, high-value tasks where building API integrations is not justified.
Related Terms
Model Context Protocol (MCP)
An open standard that defines how AI models connect to external tools, data sources, and services through a unified interface. MCP enables agents to dynamically discover and invoke capabilities without hardcoded integrations.
Tool Use
The ability of an AI model to invoke external functions, APIs, or services during a conversation to perform actions beyond text generation. Tool use transforms language models from passive responders into active problem solvers.
Function Calling
A model capability where the AI generates structured JSON arguments for predefined functions rather than free-form text. Function calling provides a reliable bridge between natural language understanding and programmatic execution.
Agentic Workflow
A multi-step process where an AI agent autonomously plans, executes, and iterates on tasks using tools, reasoning, and feedback loops. Agentic workflows go beyond single-turn interactions to accomplish complex goals.
ReAct Pattern
An agent architecture that interleaves Reasoning and Acting steps, where the model thinks about what to do next, takes an action, observes the result, and repeats. ReAct combines chain-of-thought reasoning with tool use in a unified loop.
Chain of Thought
A prompting technique that instructs the model to break down complex problems into sequential reasoning steps before producing a final answer. Chain of thought significantly improves accuracy on math, logic, and multi-step tasks.