New LP Domain Commands and Native MCP

Adrià Arrufat

Adrià Arrufat

Software Engineer

New LP Domain Commands and Native MCP

TL;DR

When we released LP.getMarkdown, we introduced the LP domain as Lightpanda’s home for CDP commands built for machines, not debugging. We’ve added: LP.getSemanticTree, LP.getInteractiveElements, and LP.getStructuredData. We’ve also shipped a native Model Context Protocol (MCP) server built directly into the Lightpanda binary, which exposes the same capabilities (markdown, semantic tree, JavaScript evaluation) without requiring CDP or automation libraries.

Expanding the LP Domain

When we released native markdown output, we introduced the LP domain as a home for Lightpanda-specific CDP commands that go beyond what the standard Chrome DevTools Protocol offers. LP.getMarkdown was the first command. We said more were coming.

That solved the content reading problem. But agents don’t only read, they need to act on pages, understand what’s interactive, and extract structured metadata. Each of these required agents to do complex work outside the browser: injecting JavaScript, parsing DOM trees, running heuristics.

The three new commands push that work into the engine.

LP.getSemanticTree

The problem with feeding page structure to LLMs is well known. The typical approach is to grab the Accessibility Tree from Chrome via CDP. In practice, this means calling Accessibility.getFullAXTree and DOM.getDocument separately, then cross-referencing both trees in your agent framework to map ARIA roles to actual elements. You end up writing heuristics to filter invisible elements, running CPU-heavy scripts to determine what’s clickable, and dealing with sync issues when the page changes mid-extraction.

Agent frameworks like Stagehand and Browser Use all do this work in their own way, but they’re all solving the same problem outside the browser.

Because we control the entire stack, we pushed this into the engine. LP.getSemanticTree traverses the live DOM in a single pass and returns a pruned, structured representation. Like LP.getMarkdown, it operates on the DOM after JavaScript has executed, so you get the actual rendered state of the page.

Here’s what happens in that single pass:

  • Extracts tag names, XPaths, ARIA roles, and computed accessible names
  • Checks the internal EventManager for bound click, mousedown, or change listeners to determine interactivity (no guessing based on tag names)
  • Streams output directly to the WebSocket to avoid allocating large intermediate buffers

Compound Component Unrolling

One persistent pain point for agents is compound components. A <select> dropdown might have 50 options, but those options are hidden in the DOM until a user clicks. Agents typically fail here because the visible representation doesn’t contain the choices.

Lightpanda natively “unrolls” compound components. For a <select>, the semantic tree output includes the full set of options attached directly to the node:

{ "nodeId": "14", "nodeName": "select", "role": "combobox", "name": "Choose a car", "options": [ { "value": "volvo", "text": "Volvo", "selected": false }, { "value": "audi", "text": "Audi", "selected": true } ] }

No extra CDP calls and no JavaScript injection to enumerate options because the browser already knows what’s there.

Text Format for Token Efficiency

For agents that need minimal overhead, LP.getSemanticTree supports a compressed text format. Pass format: "text" and you get output like this:

[4] heading: Visible Header [6] button: Native Button [11] combobox: Choose a car (value: audi) options: ['volvo', 'audi' (selected)]

Each line is a node ID, its role, its accessible name, and any relevant state. This is what gets sent to the LLM.

LP.getInteractiveElements

LP.getSemanticTree gives agents the full pruned structure of a page. LP.getInteractiveElements answers a narrower question: what can I click, type into, or interact with?

AI agents today often determine this by taking screenshots, overlaying numbered markers, and sending them to a vision LLM. That’s slow, expensive, and error-prone. Lightpanda can answer this natively because it already tracks all event listeners internally.

LP.getInteractiveElements returns every actionable element on the page in a single call. It classifies each element into one of five interactivity types:

TypeCriteria
nativebutton, a[href], input (except hidden), select, textarea, details, summary
ariaElements with an interactive ARIA role (button, link, tab, menuitem, checkbox, radio, slider, combobox, switch, etc.)
contenteditableElements with contenteditable="true"
listenerElements with addEventListener or inline handler registrations (onclick, etc.)
focusableElements with explicit tabindex >= 0 that aren’t otherwise interactive

Here’s what a result looks like:

{ "elements": [ { "tagName": "button", "role": "button", "name": "Submit", "type": "native", "listeners": ["click", "mousedown"], "tabIndex": 0, "id": "submit-btn", "class": "btn primary" }, { "tagName": "div", "role": "button", "name": "Custom action", "type": "listener", "listeners": ["click"], "tabIndex": -1 } ], "nodeIds": [42, 43] }

The key detail: listener detection is O(1) per element. Lightpanda pre-builds a target-to-event-types map from its internal EventManager in a single pass. Classification and type collection are then simple map lookups. Chrome’s DOMDebugger.getEventListeners is debug-only and requires per-element calls. LP.getInteractiveElements does it in one shot across the entire DOM.

The nodeIds array maps 1:1 to elements and all nodes are registered in the CDP node registry. Your agent can immediately use them in follow-up calls like DOM.focus or Input.dispatchMouseEvent.

LP.getStructuredData

The fourth LP command extracts all machine-readable structured data from a page in a single call.

Modern websites embed structured metadata that’s valuable for agents: product information, article details, event data, reviews, FAQs, breadcrumbs. This data is already in the page, but extracting it traditionally means injecting JavaScript to parse <script type="application/ld+json"> tags, read <meta> properties, and resolve relative URLs.

LP.getStructuredData does this natively with a single-pass TreeWalker over the DOM. Here’s what it extracts:

FormatSourceAdoption
JSON-LD<script type="application/ld+json">41% of pages (Web Almanac 2024)
Open Graph<meta property="og:*">64%
Twitter/X Cards<meta name="twitter:*">45%
HTML meta<title>, <meta name="...">, charset~100%
Link elements<link rel="canonical,icon,manifest,alternate">~100%
{ "jsonLd": ["{\"@context\":\"http://schema.org\",\"@type\":\"CollectionPage\",...}"], "openGraph": { "title": "BBC - Home", "type": "website", "url": "https://www.bbc.co.uk/", "image": "https://static.files.bbci.co.uk/.../poster-1024x576.png", "description": "The best of the BBC...", "site_name": "BBC" }, "twitterCard": { "card": "summary_large_image", "site": "@BBC" }, "meta": { "language": "en-GB", "charset": "utf-8", "title": "BBC - Home", "description": "The best of the BBC..." }, "links": { "canonical": "https://www.bbc.co.uk/", "icon": "https://static.files.bbci.co.uk/.../favicon-32.png" } }

JSON-LD is particularly valuable for agents. It gives them structured Schema.org data (products, articles, events, reviews) without any parsing or heuristics. Google explicitly recommends JSON-LD, and it appears on 41% of pages with adoption growing year over year.

Using the LP Commands

All four LP commands follow the same pattern. You open a CDP session and call them directly.

With Puppeteer:

import puppeteer from 'puppeteer-core'; const browser = await puppeteer.connect({ browserWSEndpoint: 'ws://127.0.0.1:9222' }); const context = await browser.createBrowserContext(); const page = await context.newPage(); await page.goto('https://example.com'); const client = page._client(); // Content: what the page says const markdown = await client.send('LP.getMarkdown', {}); console.log(markdown.markdown); // Structure: the pruned DOM for navigation const tree = await client.send('LP.getSemanticTree', { format: 'text' }); console.log(tree.semanticTree); // Actions: what can the agent interact with const interactive = await client.send('LP.getInteractiveElements', {}); console.log(interactive.elements); // Metadata: what the page is const structured = await client.send('LP.getStructuredData', {}); console.log(structured.jsonLd); await page.close(); await context.close(); await browser.disconnect();

With Playwright:

import { chromium } from 'playwright-core'; const browser = await chromium.connectOverCDP('ws://127.0.0.1:9222'); const context = await browser.newContext(); const page = await context.newPage(); await page.goto('https://example.com'); const client = await page.context().newCDPSession(page); const markdown = await client.send('LP.getMarkdown'); const tree = await client.send('LP.getSemanticTree', { format: 'json', prune: true }); const interactive = await client.send('LP.getInteractiveElements'); const structured = await client.send('LP.getStructuredData'); await page.close(); await context.close(); await browser.close();

Four Commands, One Page

Together, these four commands give agents a complete view of any web page:

FormatBest ForToken CostKey Advantage
Raw HTMLData extraction. When you need exact attributes, classes, or nested data structures for a parser.🔴 HighMost complete data; exactly what the browser sees.
LP.getMarkdownContent analysis. When the agent needs to read articles, product descriptions, or documentation.🟡 MediumStrips layout noise while preserving text hierarchy and links.
LP.getSemanticTreeWeb navigation & action. When the agent needs to click buttons, fill forms, or select dropdowns.🟢 LowFocuses on interactivity. Includes XPaths and unrolled <select> options.
LP.getInteractiveElementsTaking action. When the agent needs a flat list of everything it can click, type into, or select.🟢 LowEvery actionable element with listener types and node IDs for follow-up calls.
LP.getStructuredDataUnderstanding context. When the agent needs product info, article metadata, JSON-LD, or Open Graph.Understanding context. When the agent needs product info, article metadata, JSON-LD, or Open Graph.🟢 LowMachine-readable metadata already embedded in the page, extracted in one call.

The pattern across all four LP commands is the same: if the browser already has the data, the browser should do the transformation. There’s no JavaScript injection, no multi-call CDP sequences, and no external libraries.

Native MCP Server

The LP domain gives you access to these capabilities through CDP, but not every agent needs CDP. If your agent framework speaks MCP, you can skip the automation library entirely.

We already have gomcp, a Go-based MCP server that bridges MCP to Lightpanda over CDP. It works, and it’s a good option if you want a standalone server with SSE support.

But gomcp is still a bridge. The MCP client talks to gomcp, gomcp talks CDP to Lightpanda, and you’re back to multi-layer serialization. So we built an MCP server directly into the Lightpanda binary. Your agent connects over standard I/O. One process, no bridging.

Configuration

Point any MCP-compatible client at the Lightpanda binary:

{ "mcpServers": { "lightpanda": { "command": "/path/to/lightpanda", "args": ["mcp"] } } }

Now your agent can discover Lightpanda’s capabilities automatically through the MCP protocol.

Same Engine, Different Interface

The native MCP server exposes the same engine-level features as the LP domain, surfaced as MCP tools and resources rather than CDP commands.

The markdown tool calls the same conversion that powers LP.getMarkdown. The semantic_tree tool calls the same traversal behind LP.getSemanticTree. There is no translation layer or intermediate processing. The MCP server invokes these capabilities directly in the Zig engine.

Here’s the full set of tools:

  • goto: Navigate to a URL and load the page into memory
  • markdown: Get the page content as token-efficient markdown (same as LP.getMarkdown)
  • semantic_tree: Get the pruned, interactive DOM representation (same as LP.getSemanticTree)
  • links: Extract all <a href> links from the loaded page
  • interactiveElements: Collect buttons, inputs, and other interactive elements
  • structuredData: Extract JSON-LD, OpenGraph, and semantic metadata
  • evaluate: Run arbitrary JavaScript in the page context

Agents can also read page state as MCP resources: mcp://page/html for the raw DOM and mcp://page/markdown for cleaned markdown.

Where the LP Domain Is Going

The LP domain is where we’re building CDP commands that make sense when automation is the primary goal because standard CDP was designed for debugging, not for machines. The LP domain is for machines, and the native MCP server ensures those same capabilities are available to agents that don’t use CDP at all.

We’re continuing to add commands that reduce the work agents have to do outside the browser.

Get Started

Try the quickstart guide to get Lightpanda running in under 10 minutes. Working examples for both Puppeteer and Playwright are in the demo repo.

FAQ

What is LP.getSemanticTree?

LP.getSemanticTree is the second command in Lightpanda’s custom LP CDP domain. It extracts a pruned, LLM-optimized representation of the page DOM, combining tag names, ARIA roles, computed names, XPaths, and interactivity detection in a single engine-level pass.

How is this different from LP.getMarkdown?

LP.getMarkdown converts the DOM to readable text, optimized for content consumption. LP.getSemanticTree produces a structured representation focused on interactivity, including element roles, XPaths, and unrolled compound components like <select> dropdowns. Use markdown when the agent needs to read. Use the semantic tree when the agent needs to act.

Can I access the semantic tree through MCP instead of CDP?

Yes. The native MCP server exposes LP.getSemanticTree as the semantic_tree tool. The underlying engine capability is the same. MCP is the simpler path if your agent doesn’t need the full CDP automation stack.

How does the native MCP server differ from gomcp?

gomcp is a separate Go binary that bridges MCP to Lightpanda over CDP. The native MCP server runs inside the browser process, calling engine capabilities directly without CDP as an intermediary. Both support stdio transport. gomcp additionally supports SSE.

Does LP.getSemanticTree work with existing agent frameworks?

Yes. The JSON output includes nodeId, backendDOMNodeId, and unique XPaths, making it compatible with frameworks that need element references for interaction. The text format is designed for direct inclusion in LLM prompts.

What MCP transport does Lightpanda support?

The native MCP server supports standard I/O (stdio) for local use. The cloud MCP service supports SSE transport for remote connections.

Is the LP domain compatible with standard CDP?

The LP domain is a Lightpanda-specific extension. It is not part of the Chrome DevTools Protocol specification. Standard CDP commands continue to work as expected. The LP domain adds new capabilities on top.


Adrià Arrufat

Adrià Arrufat

Software Engineer

Adrià is an AI engineer at Lightpanda, where he works on making the browser more useful for AI workflows. Before Lightpanda, Adrià built machine learning systems and contributed to open-source projects across computer vision and systems programming.