The Agentic Web | Agent Engineering

How AI agents are becoming first-class participants on the internet — browsing autonomously, transacting on behalf of users, and communicating with other agents through emerging protocols and standards.

The web was built for humans clicking links. Every page, every form, every checkout flow assumes a person on the other end — reading, deciding, acting. That assumption is breaking. AI agents now browse websites, fill forms, compare prices, book flights, and make purchases. They do this not by consuming APIs but by looking at the same screens humans see and interacting with them directly. The shift is fundamental: the internet is acquiring a second class of user, and its infrastructure is not ready.

The agentic web is the term for what comes next — a phase of the internet where autonomous AI agents operate alongside humans as first-class participants. Not just answering questions, but taking actions. Not just retrieving information, but negotiating, transacting, and collaborating with other agents. IEEE Spectrum describes it as “a machine-native network in which autonomous AI agents are first-class citizens.” A July 2025 arXiv paper (“Agentic Web: Weaving the Next Web with AI Agents”) formalizes it as a three-dimensional shift across intelligence, interaction, and economics.

From Search to Delegation

The core behavioral shift: users stop asking “How do I do this?” and start saying “Have this done for me.” The agent handles the browsing, the comparison, the form-filling, the purchasing — the user sets the goal and reviews the outcome.

The Evolution in Context

Web Eras
Era	Primary Focus	Technology	User Role
Web 1.0	Static content	HTML, HTTP	Reader
Web 2.0	User-generated content	Social platforms, REST APIs	Participant
Web 3.0	Ownership, decentralization	Blockchain, smart contracts	Asset owner
Agentic Web	Autonomous action	LLMs, AI agents, interop protocols	Delegator

The agentic web differs from Web3 by prioritizing experience and automation over infrastructure and ownership. It doesn’t require new blockchains or token economies — it requires new protocols, trust frameworks, and identity systems for machines acting on behalf of people.

How Agents See the Web

Two fundamental approaches let agents interact with websites: GUI-based computer use and DOM/accessibility-tree parsing. Both are production-ready, and the best systems combine them.

Computer Use (Vision-Based)

The agent receives a screenshot, reasons about what it sees, and emits mouse/keyboard actions. This works on any application — no API integration required.

Computer Use Implementations
Platform	Company	Approach	Status
Computer Use	Anthropic	Claude sees screenshots, emits mouse/keyboard commands	Public beta (Oct 2024)
CUA	OpenAI	GPT-4o vision + reinforcement learning for GUI interaction	Powers Operator / ChatGPT Agent
Nova Act	Amazon	Custom Nova model for browser automation, 90% reliability	Generally available

DOM and Accessibility Tree Parsing

Instead of looking at pixels, these agents parse the page structure — extracting interactive elements, text content, and semantic meaning from the DOM or accessibility tree. This is faster and cheaper than vision but requires browser integration.

Playwright MCP (Microsoft) is the canonical example: an MCP server that gives LLMs browser control through the accessibility tree rather than screenshots. It’s faster, more deterministic, and doesn’t require vision models.

Hybrid Approaches

The highest-performing frameworks combine both. Browser Use (78,000+ GitHub stars) pairs DOM extraction with visual recognition for elements that resist structural parsing. Magnitude takes the opposite stance — pure vision, no DOM parsing at all — and achieves the current state of the art at 93.9% on the WebVoyager benchmark.

Web Agent Framework Comparison
Framework	Approach	Key Feature	WebVoyager Score
Magnitude	Pure vision	No DOM parsing — uses Claude Sonnet 4 vision only	93.9% (SOTA)
Browser Use	Hybrid DOM + vision	LLM + visual recognition for real-time browser control	89.1%
Stagehand v3	Atomic primitives	`act`, `extract`, `observe` with self-healing and caching	—
Magentic-One	Multi-agent	Orchestrator + WebSurfer + FileSurfer + Coder + Terminal	—

Cloud Browser Infrastructure

Agents need browsers to run in. Running headless Chrome locally works for development, but production systems need managed infrastructure — stealth mode to avoid bot detection, CAPTCHA handling, session persistence, and autoscaling.

Cloud Browser Architecture

┌──────────────────────────────────────────────────────────┐
│                     AI APPLICATION                        │
│  (LangChain agent, custom code, Operator, etc.)          │
└─────────────────────┬────────────────────────────────────┘
                    │  API calls
                    ▼
┌──────────────────────────────────────────────────────────┐
│              CLOUD BROWSER SERVICE                        │
│                                                          │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐               │
│  │ Session 1│  │ Session 2│  │ Session 3│  ... (auto)    │
│  │ Chromium │  │ Chromium │  │ Chromium │                │
│  └──────────┘  └──────────┘  └──────────┘               │
│                                                          │
│  Features:                                               │
│  ├── Stealth mode (evade bot detection)                  │
│  ├── CAPTCHA solving                                     │
│  ├── Session recording & replay                          │
│  ├── Proxy rotation                                      │
│  └── Auto-scaling                                        │
└──────────────────────────────────────────────────────────┘

Browserbase is the market leader — $40M Series B at $300M valuation, 50M+ sessions processed in 2025. They also build Stagehand, the developer framework, and Open Operator, a template for building custom web agents. Steel is the open-source alternative: self-hosted, sub-second session startup, sessions up to 24 hours. AWS entered with Bedrock AgentCore Browser, a managed cloud browser service for Nova Act agents.

Consumer Web Agents

Every major AI company now ships a product that browses the web for users:

Consumer Agentic Browsers (as of early 2026)
Product	Company	Model	Access
ChatGPT Agent	OpenAI	CUA (GPT-4o + RL)	Integrated into ChatGPT (July 2025)
Project Mariner	Google DeepMind	Gemini	Chrome extension, AI Ultra plan ($249.99/mo)
Auto Browse	Google Chrome	Gemini 3	Premium subscribers, via Gemini side panel (Jan 2026)
Comet	Perplexity	Proprietary	Free to all users (Oct 2025)

These products converge on the same UX pattern: the user states a goal in natural language, the agent takes over the browser, and the user watches or reviews the result. The hard part isn’t the browsing — it’s trust. Users must trust the agent with passwords, payment info, and personal decisions.

The Protocol Stack

For agents to operate across the web at scale, they need standardized ways to discover each other, communicate, authenticate, and transact. Four major protocols are emerging, each solving a different layer of the problem.

Agentic Web Protocol Layers

┌──────────────────────────────────────────────────────────┐
│                   APPLICATION LAYER                       │
│  agents.json — Workflow contracts for multi-step tasks    │
│  llms.txt — Machine-readable site documentation          │
├──────────────────────────────────────────────────────────┤
│                 AGENT-TO-AGENT LAYER                     │
│  A2A — Task-based agent communication (Google)           │
│  ACP — RESTful agent interoperability (BeeAI)            │
│  ANP — Decentralized agent network (DID-based)           │
├──────────────────────────────────────────────────────────┤
│                 AGENT-TO-TOOLS LAYER                     │
│  MCP — Connect agents to tools and data (Anthropic)      │
├──────────────────────────────────────────────────────────┤
│                   IDENTITY LAYER                         │
│  Web Bot Auth — Cryptographic agent identity (RFC 9421)  │
│  A2A Agent Cards — /.well-known/agent-card.json          │
│  Agent Registry — Decentralized identity discovery       │
└──────────────────────────────────────────────────────────┘

MCP: Agent-to-Tools

The Model Context Protocol (Anthropic, now under Linux Foundation’s AAIF) standardizes how agents connect to external tools and data sources. For the agentic web specifically, the Playwright MCP server lets any MCP-compatible agent control a browser. The protocol’s latest spec (November 2025) supports parallel tool calls, concurrent execution, and OAuth-based authorization.

MCP is the plumbing layer — it doesn’t handle agent-to-agent communication, but it’s the standard way agents get things done.

A2A: Agent-to-Agent

Google’s Agent2Agent Protocol (April 2025, now Linux Foundation) handles communication between agents. Over 150 organizations support it — Atlassian, Salesforce, SAP, ServiceNow, PayPal, and more.

Core concepts:

Agent Cards: JSON metadata at /.well-known/agent-card.json describing capabilities, skills, endpoints, and auth requirements
Tasks: The unit of work — agents send tasks to each other with defined lifecycles
Messages: Context, replies, artifacts, or user instructions exchanged during task execution

Transport: HTTP, SSE, JSON-RPC 2.0. Version 0.3 (July 2025) added gRPC support and signed security cards for cryptographic identity.

MCP and A2A are complementary: MCP connects an agent to tools; A2A connects an agent to other agents.

ANP: Decentralized Agent Network

The Agent Network Protocol positions itself as “HTTP of the Agentic Web era.” It uses W3C Decentralized Identifiers (DIDs) for authentication, end-to-end encryption, and a three-layer architecture: identity, meta-protocol negotiation, and application protocols. No central authority — agents have equal status and discover each other through a decentralized network.

ACP: RESTful Interoperability

The Agent Communication Protocol (BeeAI, Linux Foundation) takes a simpler approach: a standardized REST API for agent interoperability. Framework-agnostic — works with LangChain, CrewAI, or custom code. SDKs available for Python and TypeScript.

Discovery and Machine-Readable Standards

For agents to navigate the web effectively, websites need to declare what they offer in machine-readable formats. Several standards are emerging:

Agent Discovery Standards
Standard	Location	Purpose
A2A Agent Cards	`/.well-known/agent-card.json`	Describe agent capabilities, skills, auth, and endpoints
llms.txt	`/llms.txt`	Structured markdown summary of site content for LLMs
agents.json	`/agents.json`	Workflow contracts for multi-step API interactions (built on OpenAPI)
Agent Registry	Cloudflare format	Decentralized discovery and cryptographic verification of agent identities
robots.txt	`/robots.txt`	Legacy — binary allow/block, increasingly inadequate for agents

The llms.txt standard is particularly interesting: a structured markdown file at the site root that gives LLMs a compact overview of what the site contains, with links to full documentation. Two variants exist — llms.txt for a compact overview with links, and llms-full.txt for complete content embedded directly.

Cloudflare’s Agent Registry adds cryptographic teeth: agents sign HTTP requests using published Ed25519 keys, and websites validate those signatures against public keys. This integrates with Web Bot Auth (based on IETF RFC 9421 for HTTP Message Signatures) — time-based, non-replayable verification of agent identity.

Agentic Commerce

Commerce is the agentic web’s killer application. Adobe Analytics reported a 4,700% year-over-year increase in AI agent traffic to US retail sites in July 2025. By Q3 2025, 38% of consumers were using AI for shopping.

The challenge: today’s payment infrastructure assumes a human in the loop. Credit card flows, 3D Secure, CAPTCHA challenges — all designed for a person at a keyboard. Agents need new payment primitives.

Agentic Commerce Flow

┌──────────┐     goal      ┌──────────────┐
│   USER   │──────────────▶│  AI AGENT    │
│          │               │              │
│ "Find me │               │ 1. Browse    │
│  a deal  │               │    retailers │
│  on X"   │               │ 2. Compare   │
│          │               │    prices    │
└──────────┘               │ 3. Select    │
   ▲                     │    best deal │
   │                     └──────┬───────┘
   │  confirmation               │
   │  request                    │ Shared Payment Token
   │                             ▼
   │                     ┌──────────────┐
   │                     │   STRIPE /   │
   └─────────────────────│   PAYMENT    │
                         │   NETWORK    │
                         └──────────────┘

Key Payment Infrastructure

Stripe’s Agentic Commerce Suite introduces Shared Payment Tokens (SPTs) — a new payment primitive where agents can initiate payments using the buyer’s saved method without exposing credentials. SPTs are scoped to a specific seller, bounded by time and amount. Integration spans major e-commerce platforms: Wix, WooCommerce, BigCommerce, Squarespace, and commercetools.

Visa’s Trusted Agent Protocol (developed with Cloudflare) integrates Web Bot Auth for cryptographic agent verification within payment networks. Mastercard’s Agent Pay provides an agentic token framework for trusted AI transactions. Both partner with Cloudflare for the underlying identity verification.

The Trust Gap

Today’s four-party payment model (buyer, seller, issuer, acquirer) was never designed for a fifth participant — an autonomous agent. New trust frameworks must link verified identity, intent, payment credentials, and consent with cryptographic proof. Without this, agentic commerce is one prompt injection away from unauthorized purchases.

Security: The Hard Problems

Web agents face a unique threat model. They consume untrusted content (the open web) and take consequential actions (purchases, form submissions, data entry). Every website an agent visits is a potential attack surface.

Prompt Injection on the Open Web

OWASP 2025 ranks prompt injection as the #1 critical vulnerability for LLM applications, affecting 73% of production deployments. For web agents, the problem is acute: any website can embed hidden instructions in its HTML that hijack agent behavior.

OpenAI stated in December 2025 that prompt injection “may never be fully solved.” The UK National Cyber Security Centre confirmed the same assessment. The WASP benchmark (2025) specifically evaluates web agent resilience against prompt injection attacks.

Attack Vectors Specific to Web Agents

Web Agent Security Threats
Attack	Description	Severity
Hidden instruction injection	Malicious instructions in invisible HTML elements that redirect agent behavior	Critical
Confused deputy	Attackers trick a trusted agent into performing actions on their behalf	Critical
Zero-interaction exfiltration	Data theft without any user interaction — the agent leaks data by visiting a crafted page	High
Fake storefront phishing	AI browsers purchasing from fraudulent storefronts that look legitimate to the agent	High
Agent impersonation	Malicious agents pretending to be legitimate ones to gain trust	High
Cascade failures	Agent-to-agent interactions amplifying errors or attacks across systems	Medium

Emerging Defenses

The security response centers on cryptographic identity:

Web Bot Auth (RFC 9421): Agents sign HTTP requests with published keys — websites verify the signature
A2A Signed Security Cards: Cryptographic agent identity baked into the A2A protocol (v0.3)
Agent Registry: Decentralized discovery of verified agent identities with Ed25519 keys
Scoped Payment Tokens: Time-limited, amount-bounded, seller-specific — limiting blast radius of compromised agents
AGNTCY SLIM: Quantum-safe messaging for agent-to-agent communication

Defense in Depth for Web Agents

No single mechanism solves web agent security. The emerging best practice stacks: cryptographic identity verification (Web Bot Auth), scoped permissions (SPTs), human-in-the-loop for high-stakes actions, and sandboxed execution environments (cloud browsers). Assume prompt injection will happen — design systems that limit its impact.

Open Challenges

The agentic web is early. Current systems work well enough for demos and specific use cases, but fundamental problems remain unsolved.

Reliability. The best web agents score 87–94% on benchmarks. That means 6–13% of tasks fail. For booking a flight or making a purchase, even 1% failure isn’t acceptable. Agents are, as one researcher put it, “slow and prone to mistakes.”

Protocol fragmentation. MCP, A2A, ACP, ANP — four protocols from four organizations, all under active development, none universally adopted. The web needs convergence, not more options.

Bot detection paradox. CAPTCHAs, Cloudflare challenges, and rate limiters were built to block bots. Legitimate AI agents are technically bots. The infrastructure designed to protect the web now blocks the agents trying to use it productively.

Accountability gaps. When an agent makes a bad purchase, sends the wrong message, or leaks data — who is responsible? The user who delegated? The developer who built the agent? The platform that hosted it? No legal framework addresses this clearly.

Cost. Vision-model-based agents like Magnitude (using Claude Sonnet 4) are expensive per task. Cloud browser sessions add infrastructure cost. At scale, the economics of agents browsing websites are materially different from API-to-API communication.

Content rights. Publishers are already blocking AI crawlers — 54.2% of news sites block at least one AI bot. As agents consume more web content, the tension between utility and intellectual property will intensify.

The Path Forward

The agentic web will not replace the human web. IEEE Spectrum envisions them “connected and blended together” — humans delegating to agents, agents collaborating with other agents, with human oversight when stakes are high. The architecture must support all of these modes simultaneously.

What’s needed:

Protocol convergence — the industry must settle on fewer, interoperable standards rather than proliferating competing options
Agent identity as infrastructure — Web Bot Auth and Agent Cards need the same ubiquity as SSL certificates
Graduated trust — agents should earn permissions over time, not get blanket access from day one
Machine-readable web — llms.txt, agents.json, and Agent Cards should become as standard as robots.txt
Security-first design — assume every webpage is adversarial, scope every permission, and keep humans in the loop for consequential actions

The pieces are falling into place — cloud browsers, computer use APIs, interop protocols, payment infrastructure, identity standards. The question is no longer whether agents will operate on the web but how quickly the web adapts to serve them.