New LP Domain Commands and Native MCP

Adrià Arrufat
Software Engineer

TL;DR
When we released
LP.getMarkdown, we
introduced the LP domain as Lightpanda’s home for CDP commands built for
machines, not debugging. We’ve added: LP.getSemanticTree,
LP.getInteractiveElements, and LP.getStructuredData. We’ve also shipped a
native Model Context Protocol (MCP) server
built directly into the Lightpanda binary, which exposes the same capabilities
(markdown, semantic tree, JavaScript evaluation) without requiring CDP or
automation libraries.
Expanding the LP Domain
When we released native markdown
output , we introduced
the LP domain as a home for Lightpanda-specific CDP commands that go beyond
what the standard Chrome DevTools
Protocol offers.
LP.getMarkdown was the first command. We said more were coming.
That solved the content reading problem. But agents don’t only read, they need to act on pages, understand what’s interactive, and extract structured metadata. Each of these required agents to do complex work outside the browser: injecting JavaScript, parsing DOM trees, running heuristics.
The three new commands push that work into the engine.
LP.getSemanticTree
The problem with feeding page structure to LLMs is well known. The typical
approach is to grab the Accessibility
Tree
from Chrome via CDP. In practice, this means calling
Accessibility.getFullAXTree and DOM.getDocument separately, then
cross-referencing both trees in your agent framework to map ARIA roles to
actual elements. You end up writing heuristics to filter invisible elements,
running CPU-heavy scripts to determine what’s clickable, and dealing with sync
issues when the page changes mid-extraction.
Agent frameworks like Stagehand and Browser Use all do this work in their own way, but they’re all solving the same problem outside the browser.
Because we control the entire stack, we pushed this into the engine.
LP.getSemanticTree traverses the live DOM in a single pass and returns a
pruned, structured representation. Like LP.getMarkdown, it operates on the
DOM after JavaScript has executed, so you get the actual rendered state of the
page.
Here’s what happens in that single pass:
- Extracts tag names, XPaths, ARIA roles, and computed accessible names
- Checks the internal EventManager for bound
click,mousedown, orchangelisteners to determine interactivity (no guessing based on tag names) - Streams output directly to the WebSocket to avoid allocating large intermediate buffers
Compound Component Unrolling
One persistent pain point for agents is compound components. A <select> dropdown might have 50 options, but those options are hidden in the DOM until a user clicks. Agents typically fail here because the visible representation doesn’t contain the choices.
Lightpanda natively “unrolls” compound components. For a <select>, the semantic tree output includes the full set of options attached directly to the node:
{
"nodeId": "14",
"nodeName": "select",
"role": "combobox",
"name": "Choose a car",
"options": [
{ "value": "volvo", "text": "Volvo", "selected": false },
{ "value": "audi", "text": "Audi", "selected": true }
]
}No extra CDP calls and no JavaScript injection to enumerate options because the browser already knows what’s there.
Text Format for Token Efficiency
For agents that need minimal overhead, LP.getSemanticTree supports a compressed text format. Pass format: "text" and you get output like this:
[4] heading: Visible Header
[6] button: Native Button
[11] combobox: Choose a car (value: audi) options: ['volvo', 'audi' (selected)]Each line is a node ID, its role, its accessible name, and any relevant state. This is what gets sent to the LLM.
LP.getInteractiveElements
LP.getSemanticTree gives agents the full pruned structure of a page. LP.getInteractiveElements answers a narrower question: what can I click, type into, or interact with?
AI agents today often determine this by taking screenshots, overlaying numbered markers, and sending them to a vision LLM. That’s slow, expensive, and error-prone. Lightpanda can answer this natively because it already tracks all event listeners internally.
LP.getInteractiveElements returns every actionable element on the page in a single call. It classifies each element into one of five interactivity types:
| Type | Criteria |
|---|---|
native | button, a[href], input (except hidden), select, textarea, details, summary |
aria | Elements with an interactive ARIA role (button, link, tab, menuitem, checkbox, radio, slider, combobox, switch, etc.) |
contenteditable | Elements with contenteditable="true" |
listener | Elements with addEventListener or inline handler registrations (onclick, etc.) |
focusable | Elements with explicit tabindex >= 0 that aren’t otherwise interactive |
Here’s what a result looks like:
{
"elements": [
{
"tagName": "button",
"role": "button",
"name": "Submit",
"type": "native",
"listeners": ["click", "mousedown"],
"tabIndex": 0,
"id": "submit-btn",
"class": "btn primary"
},
{
"tagName": "div",
"role": "button",
"name": "Custom action",
"type": "listener",
"listeners": ["click"],
"tabIndex": -1
}
],
"nodeIds": [42, 43]
}The key detail: listener detection is O(1) per element. Lightpanda pre-builds a
target-to-event-types map from its internal EventManager in a single pass.
Classification and type collection are then simple map lookups. Chrome’s
DOMDebugger.getEventListeners is debug-only and requires per-element calls.
LP.getInteractiveElements does it in one shot across the entire DOM.
The nodeIds array maps 1:1 to elements and all nodes are registered in the CDP node registry. Your agent can immediately use them in follow-up calls like DOM.focus or Input.dispatchMouseEvent.
LP.getStructuredData
The fourth LP command extracts all machine-readable structured data from a page in a single call.
Modern websites embed structured metadata that’s valuable for agents: product
information, article details, event data, reviews, FAQs, breadcrumbs. This data
is already in the page, but extracting it traditionally means injecting
JavaScript to parse <script type="application/ld+json"> tags, read <meta>
properties, and resolve relative URLs.
LP.getStructuredData does this natively with a single-pass TreeWalker over the
DOM. Here’s what it extracts:
| Format | Source | Adoption |
|---|---|---|
| JSON-LD | <script type="application/ld+json"> | 41% of pages (Web Almanac 2024 ) |
| Open Graph | <meta property="og:*"> | 64% |
| Twitter/X Cards | <meta name="twitter:*"> | 45% |
| HTML meta | <title>, <meta name="...">, charset | ~100% |
| Link elements | <link rel="canonical,icon,manifest,alternate"> | ~100% |
{
"jsonLd": ["{\"@context\":\"http://schema.org\",\"@type\":\"CollectionPage\",...}"],
"openGraph": {
"title": "BBC - Home",
"type": "website",
"url": "https://www.bbc.co.uk/",
"image": "https://static.files.bbci.co.uk/.../poster-1024x576.png",
"description": "The best of the BBC...",
"site_name": "BBC"
},
"twitterCard": {
"card": "summary_large_image",
"site": "@BBC"
},
"meta": {
"language": "en-GB",
"charset": "utf-8",
"title": "BBC - Home",
"description": "The best of the BBC..."
},
"links": {
"canonical": "https://www.bbc.co.uk/",
"icon": "https://static.files.bbci.co.uk/.../favicon-32.png"
}
}JSON-LD is particularly valuable for agents. It gives them structured Schema.org data (products, articles, events, reviews) without any parsing or heuristics. Google explicitly recommends JSON-LD, and it appears on 41% of pages with adoption growing year over year.
Using the LP Commands
All four LP commands follow the same pattern. You open a CDP session and call them directly.
With Puppeteer:
import puppeteer from 'puppeteer-core';
const browser = await puppeteer.connect({
browserWSEndpoint: 'ws://127.0.0.1:9222'
});
const context = await browser.createBrowserContext();
const page = await context.newPage();
await page.goto('https://example.com');
const client = page._client();
// Content: what the page says
const markdown = await client.send('LP.getMarkdown', {});
console.log(markdown.markdown);
// Structure: the pruned DOM for navigation
const tree = await client.send('LP.getSemanticTree', { format: 'text' });
console.log(tree.semanticTree);
// Actions: what can the agent interact with
const interactive = await client.send('LP.getInteractiveElements', {});
console.log(interactive.elements);
// Metadata: what the page is
const structured = await client.send('LP.getStructuredData', {});
console.log(structured.jsonLd);
await page.close();
await context.close();
await browser.disconnect();With Playwright:
import { chromium } from 'playwright-core';
const browser = await chromium.connectOverCDP('ws://127.0.0.1:9222');
const context = await browser.newContext();
const page = await context.newPage();
await page.goto('https://example.com');
const client = await page.context().newCDPSession(page);
const markdown = await client.send('LP.getMarkdown');
const tree = await client.send('LP.getSemanticTree', { format: 'json', prune: true });
const interactive = await client.send('LP.getInteractiveElements');
const structured = await client.send('LP.getStructuredData');
await page.close();
await context.close();
await browser.close();Four Commands, One Page
Together, these four commands give agents a complete view of any web page:
| Format | Best For | Token Cost | Key Advantage |
|---|---|---|---|
| Raw HTML | Data extraction. When you need exact attributes, classes, or nested data structures for a parser. | 🔴 High | Most complete data; exactly what the browser sees. |
| LP.getMarkdown | Content analysis. When the agent needs to read articles, product descriptions, or documentation. | 🟡 Medium | Strips layout noise while preserving text hierarchy and links. |
| LP.getSemanticTree | Web navigation & action. When the agent needs to click buttons, fill forms, or select dropdowns. | 🟢 Low | Focuses on interactivity. Includes XPaths and unrolled <select> options. |
| LP.getInteractiveElements | Taking action. When the agent needs a flat list of everything it can click, type into, or select. | 🟢 Low | Every actionable element with listener types and node IDs for follow-up calls. |
| LP.getStructuredData | Understanding context. When the agent needs product info, article metadata, JSON-LD, or Open Graph.Understanding context. When the agent needs product info, article metadata, JSON-LD, or Open Graph. | 🟢 Low | Machine-readable metadata already embedded in the page, extracted in one call. |
The pattern across all four LP commands is the same: if the browser already has the data, the browser should do the transformation. There’s no JavaScript injection, no multi-call CDP sequences, and no external libraries.
Native MCP Server
The LP domain gives you access to these capabilities through CDP, but not every agent needs CDP. If your agent framework speaks MCP , you can skip the automation library entirely.
We already have gomcp , a Go-based MCP server that bridges MCP to Lightpanda over CDP. It works, and it’s a good option if you want a standalone server with SSE support.
But gomcp is still a bridge. The MCP client talks to gomcp, gomcp talks CDP to Lightpanda, and you’re back to multi-layer serialization. So we built an MCP server directly into the Lightpanda binary. Your agent connects over standard I/O. One process, no bridging.
Configuration
Point any MCP-compatible client at the Lightpanda binary:
{
"mcpServers": {
"lightpanda": {
"command": "/path/to/lightpanda",
"args": ["mcp"]
}
}
}Now your agent can discover Lightpanda’s capabilities automatically through the MCP protocol.
Same Engine, Different Interface
The native MCP server exposes the same engine-level features as the LP domain, surfaced as MCP tools and resources rather than CDP commands.
The markdown tool calls the same conversion that powers LP.getMarkdown. The semantic_tree tool calls the same traversal behind LP.getSemanticTree. There is no translation layer or intermediate processing. The MCP server invokes these capabilities directly in the Zig engine.
Here’s the full set of tools:
- goto: Navigate to a URL and load the page into memory
- markdown: Get the page content as token-efficient markdown (same as
LP.getMarkdown) - semantic_tree: Get the pruned, interactive DOM representation (same as
LP.getSemanticTree) - links: Extract all
<a href>links from the loaded page - interactiveElements: Collect buttons, inputs, and other interactive elements
- structuredData: Extract JSON-LD, OpenGraph, and semantic metadata
- evaluate: Run arbitrary JavaScript in the page context
Agents can also read page state as MCP resources: mcp://page/html for the raw DOM and mcp://page/markdown for cleaned markdown.
Where the LP Domain Is Going
The LP domain is where we’re building CDP commands that make sense when automation is the primary goal because standard CDP was designed for debugging , not for machines. The LP domain is for machines, and the native MCP server ensures those same capabilities are available to agents that don’t use CDP at all.
We’re continuing to add commands that reduce the work agents have to do outside the browser.
Get Started
Try the quickstart guide to get Lightpanda running in under 10 minutes. Working examples for both Puppeteer and Playwright are in the demo repo.
FAQ
What is LP.getSemanticTree?
LP.getSemanticTree is the second command in Lightpanda’s custom LP CDP
domain. It extracts a pruned, LLM-optimized representation of the page DOM,
combining tag names, ARIA roles, computed names, XPaths, and interactivity
detection in a single engine-level pass.
How is this different from LP.getMarkdown?
LP.getMarkdown converts the DOM to readable text, optimized for content
consumption. LP.getSemanticTree produces a structured representation focused on
interactivity, including element roles, XPaths, and unrolled compound
components like <select> dropdowns. Use markdown when the agent needs to read.
Use the semantic tree when the agent needs to act.
Can I access the semantic tree through MCP instead of CDP?
Yes. The native MCP server exposes LP.getSemanticTree as the semantic_tree
tool. The underlying engine capability is the same. MCP is the simpler path if
your agent doesn’t need the full CDP automation stack.
How does the native MCP server differ from gomcp?
gomcp is a separate Go binary that bridges MCP to Lightpanda over CDP. The native MCP server runs inside the browser process, calling engine capabilities directly without CDP as an intermediary. Both support stdio transport. gomcp additionally supports SSE.
Does LP.getSemanticTree work with existing agent frameworks?
Yes. The JSON output includes nodeId, backendDOMNodeId, and unique XPaths,
making it compatible with frameworks that need element references for
interaction. The text format is designed for direct inclusion in LLM prompts.
What MCP transport does Lightpanda support?
The native MCP server supports standard I/O (stdio) for local use. The cloud MCP service supports SSE transport for remote connections.
Is the LP domain compatible with standard CDP?
The LP domain is a Lightpanda-specific extension. It is not part of the
Chrome DevTools Protocol specification. Standard CDP commands continue to work
as expected. The LP domain adds new capabilities on top.

Adrià Arrufat
Software Engineer
Adrià is an AI engineer at Lightpanda, where he works on making the browser more useful for AI workflows. Before Lightpanda, Adrià built machine learning systems and contributed to open-source projects across computer vision and systems programming.