Addresse

Boulevard la girande
Casablanca, MAROC

Numéro de téléphone

+212 681 53 04 05

Adresse email

info@skyweb3agency.com

Every major AI platform can now browse websites autonomously. Chrome’s auto browse scrolls and clicks. ChatGPT Atlas fills forms and completes purchases. Perplexity Comet researches across tabs. But none of these agents sees your website the way a human does.

This is Part 4 in a five-part series on optimizing websites for the agentic web. Part 1 covered the evolution from SEO to AAIO. Part 2 explained how to get your content cited in AI responses. Part 3 mapped the protocols forming the infrastructure layer. This article gets technical: how AI agents actually perceive your website, and what to build for them.

The core insight is one that keeps coming up in my research: The most impactful thing you can do for AI agent compatibility is the same work web accessibility advocates have been pushing for decades. The accessibility tree, originally built for screen readers, is becoming the primary interface between AI agents and your website.

According to the 2025 Imperva Bad Bot Report (Imperva is a cybersecurity company), automated traffic surpassed human traffic for the first time in 2024, constituting 51% of all web interactions. Not all of that is agentic browsing, but the direction is clear: the non-human audience for your website is already larger than the human one, and it’s growing. Throughout this article, we draw exclusively from official documentation, peer-reviewed research, and announcements from the companies building this infrastructure.

Three Ways Agents See Your Website

When a human visits your website, they see colors, layout, images, and typography. When an AI agent visits, it sees something entirely different. Understanding what agents actually perceive is the foundation for building websites that work for them.

The major AI platforms use three distinct approaches, and the differences have direct implications for how you should structure your website.

Vision: Reading Screenshots

Anthropic’s Computer Use takes the most literal approach. Claude captures screenshots of the browser, analyzes the visual content, and decides what to click or type based on what it “sees.” It’s a continuous feedback loop: screenshot, reason, act, screenshot. The agent operates at the pixel level, identifying buttons by their visual appearance and reading text from the rendered image.

Google’s Project Mariner follows a similar pattern with what Google describes as an “observe-plan-act” loop: observe captures visual elements and underlying code structures, plan formulates action sequences, and act simulates user interactions. Mariner achieved an 83.5% success rate on the WebVoyager benchmark.

The vision approach works, but it’s computationally expensive, sensitive to layout changes, and limited by what’s visually rendered on screen.

Accessibility Tree: Reading Structure

OpenAI took a different path with ChatGPT Atlas. Their Publishers and Developers FAQ is explicit:

ChatGPT Atlas uses ARIA tags, the same labels and roles that support screen readers, to interpret page structure and interactive elements.

Atlas is built on Chromium, but rather than analyzing rendered pixels, it queries the accessibility tree for elements with specific roles (“button”, “link”) and accessible names. This is the same data structure that screen readers like VoiceOver and NVDA use to help people with visual disabilities navigate the web.

Microsoft’s Playwright MCP, the official MCP server for browser automation, takes the same approach. It provides accessibility snapshots rather than screenshots, giving AI models a structured representation of the page. Microsoft deliberately chose accessibility data over visual rendering for their browser automation standard.

Hybrid: Both At Once

In practice, the most capable agents combine approaches. OpenAI’s Computer-Using Agent (CUA), which powers both Operator and Atlas, layers screenshot analysis with DOM processing and accessibility tree parsing. It prioritizes ARIA labels and roles, falling back to text content and structural selectors when accessibility data isn’t available.

Perplexity’s research confirms the same pattern. Their BrowseSafe paper, which details the safety infrastructure behind Comet’s browser agent, describes using “hybrid context management combining accessibility tree snapshots with selective vision.”

Platform Primary Approach Details
Anthropic Computer Use Vision (screenshots) Screenshot, reason, act feedback loop
Google Project Mariner Vision + code structure Observe-plan-act with visual and structural data
OpenAI Atlas Accessibility tree Explicitly uses ARIA tags and roles
OpenAI CUA Hybrid Screenshots + DOM + accessibility tree
Microsoft Playwright MCP Accessibility tree Accessibility snapshots, no screenshots
Perplexity Comet Hybrid Accessibility tree + selective vision

The pattern is clear. Even platforms that started with vision-first approaches are incorporating accessibility data. And the platforms optimizing for reliability and efficiency (Atlas, Playwright MCP) lead with the accessibility tree.

Your website’s accessibility tree isn’t a compliance artifact. It’s increasingly the primary interface agents use to understand and interact with your website.

Last year, before the European Accessibility Act took effect, I half-joked that it would be ironic if the thing that finally got people to care about accessibility was AI agents, not the people accessibility was designed for. That’s no longer a joke.

The Accessibility Tree Is Your Agent Interface

The accessibility tree is a simplified representation of your page’s DOM that browsers generate for assistive technologies. Where the full DOM contains every div, span, style, and script, the accessibility tree strips away the noise and exposes only what matters: interactive elements, their roles, their names, and their states.

This is why it works so well for agents. A typical page’s DOM might contain thousands of nodes. The accessibility tree reduces that to the elements a user (or agent) can actually interact with: buttons, links, form fields, headings, landmarks. For AI models that process web pages within a limited context window, that reduction is significant.

OpenAI’s Publishers and Developers FAQ is very clear about this:

Follow WAI-ARIA best practices by adding descriptive roles, labels, and states to interactive elements like buttons, menus, and forms. This helps ChatGPT recognize what each element does and interact with your site more accurately.

And:

Making your website more accessible helps ChatGPT Agent in Atlas understand it better.

Research data backs this up. The most rigorous data on this comes from a UC Berkeley and University of Michigan study published for CHI 2026, the premier academic conference on human-computer interaction. The researchers tested Claude Sonnet 4.5 on 60 real-world web tasks under different accessibility conditions, collecting 40.4 hours of interaction data across 158,325 events. The results were striking:

Condition Task Success Rate Avg. Completion Time
Standard (default) 78.33% 324.87 seconds
Keyboard-only 41.67% 650.91 seconds
Magnified viewport 28.33% 1,072.20 seconds

Under standard conditions, the agent succeeded nearly 80% of the time. Restrict it to keyboard-only interaction (simulating how screen reader users navigate) and success drops to 42%, taking twice as long. Restrict the viewport (simulating magnification tools), and success drops to 28%, taking over three times as long.

The paper identifies three categories of gaps:

  • Perception gaps: agents can’t reliably access screen reader announcements or ARIA state changes that would tell them what happened after an action.
  • Cognitive gaps: agents struggle to track task state across multiple steps.
  • Action gaps: agents underutilize keyboard shortcuts and fail at interactions like drag-and-drop.

The implication is direct. Websites that present a rich, well-labeled accessibility tree give agents the information they need to succeed. Websites that rely on visual cues, hover states, or complex JavaScript interactions without accessible alternatives create the conditions for agent failure.

Perplexity’s search API architecture paper from September 2025 reinforces this from the content side. Their indexing system prioritizes content that is “high quality in both substance and form, with information captured in a manner that preserves the original content structure and layout.” Websites “heavy on well-structured data in list or table form” benefit from “more formulaic parsing and extraction rules.” Structure isn’t just helpful. It’s what makes reliable parsing possible.

Semantic HTML: The Agent Foundation

The accessibility tree is built from your HTML. Use semantic elements, and the browser generates a useful accessibility tree automatically. Skip them, and the tree is sparse or misleading.

This isn’t new advice. Web standards advocates have been screaming “use semantic HTML” for two decades. Not everyone listened. What’s new is that the audience has expanded. It used to be about screen readers and a relatively small percentage of users. Now it’s about every AI agent that visits your website.

Use native elements. A

Search flights

Label your forms. Every input needs an associated label. Agents read labels to understand what data a field expects.








The autocomplete attribute deserves attention. It tells agents (and browsers) exactly what type of data a field expects, using standardized values like name, email, tel, street-address, and organization. When an agent fills a form on someone’s behalf, autocomplete attributes make the difference between confident field mapping and guessing.

Establish heading hierarchy. Use h1 through h6 in logical order. Agents use headings to understand page structure and locate specific content sections. Skip levels (jumping from h1 to h4) create confusion about content relationships.

Use landmark regions. HTML5 landmark elements (


Slobodan Manic

Host of the No Hacks Podcast and machine-first web optimization consultant at No Hacks

Slobodan “Sani” Manić is a website optimisation consultant with over 15 years of experience helping businesses make their sites faster, ...

Source link

Leave a Reply

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *