PandaScript
A PandaScript is a web automation script that can be run directly by Lightpanda browser.
No environment to set up like NodeJS or Python, no Puppeteer/Playwright to write, no CDP serialization cost, no LLM required: just vanilla Javascript with a small set of native primitives.
Use normal JavaScript variables, functions, loops, objects, arrays, JSON.parse, JSON.stringify, and other standard ECMAScript built-ins.
It’s reproducible, deterministic and token-free (no LLM required).
To run a PandaScript:
lightpanda agent <my_script>.jsRuntime Environment
Agent scripts run in their own V8 context. That context is separate from the web page’s JavaScript context.
- It is not the browser page environment. There is no
window,document, DOM,localStorage,navigator, or page global state in the agent script. Read page data withextract(...), or explicitly run page JavaScript withevaluate(...)when that is the right tool. - It is not Node.js. There is no
require,process,fs,path, npm package loading, command-line argument API, or Node network/filesystem API. - Page scripts cannot see agent variables or Lightpanda primitives. Agent scripts cannot directly see page variables.
- The global
evaluate(...)primitive runs JavaScript in the page context, distinct from the agent context’s own nativeeval. - Agent variables persist for the lifetime of one script run, across
navigations and primitive calls. A later
lightpanda agent script.jsrun starts with a fresh agent context. - The installed primitives are synchronous and blocking. Do not write an
async/awaitautomation contract around them. Scripts compile as classic scripts, so top-levelawaitis aSyntaxError; promise callbacks (.then) only run after the script body finishes, never in between primitive calls. - Tool failures throw JavaScript
Errorexceptions and stop execution unless you catch them. - The script’s completion value (its last top-level expression) is printed
automatically (objects and arrays as JSON; other values coerced). End a
script with the bare expression you want as output, e.g. a final
extract({ ... });orresults;.console.log(...)is for extra or debug output and does not JSON-format objects.
The agent context includes a small console object:
console.log("printed to stdout");
console.info("printed to stdout");
console.debug("printed to stdout");
console.warn("printed to stderr");
console.error("printed to stderr");Values And Return Types
Most primitives return the browser tool’s result text as a JavaScript string.
extract(...) is the exception: it returns extracted data as a normal
JavaScript value, so local script logic can use it directly. The result mirrors
your schema: an object schema returns an object keyed by your fields (even with
a single field), and a bare array schema returns an array:
const page = new Page();
await page.goto("https://news.ycombinator.com/");
const data = page.extract({
title: "title",
stories: [{
selector: "tr.athing",
limit: 5,
fields: {
id: { attr: "id" },
title: ".titleline > a"
}
}]
});
data; // printed automatically as JSONDestructure when a single field is all you need:
const { stories } = page.extract({
stories: [{
selector: "tr.athing",
limit: 5,
fields: {
id: { attr: "id" },
title: ".titleline > a"
}
}]
});
for (const story of stories) {
console.log(story.title);
}evaluate(...) still returns the page evaluate tool result text. When page evaluate(...)
returns an object or array, that text is JSON.
Primitive arguments must be JSON-serializable. Strings, numbers, booleans,
arrays, plain objects, and null work. undefined, functions, symbols, and
cyclic objects do not.
Installed Primitives
Only recorded browser primitives are installed globally:
| Primitive | Arguments | Runs in |
|---|---|---|
new Page() | Browser session | |
page.goto | goto(url[, { timeout }]) | Browser page |
page.extract | extract(schema) or extract({ schema }) | Browser page via extractor; returns a JS object or array |
page.evaluate | evaluate(script[, { url, timeout, save }]) | Browser page JS context |
page.click | click(selector) or click({ selector }) | Browser page |
page.fill | fill(selector, value) or fill({ selector, value }) | Browser page |
page.scroll | scroll() or scroll({ x, y }) | Browser page |
page.waitForSelector | waitForSelector(selector[, { timeout }]) | Browser page |
page.waitForScript | waitForScript(script[, { timeout }]) | Browser page JS context |
page.waitForState | waitForState(state[, { timeout }]) | Browser page |
page.hover | hover(selector) or hover({ selector }) | Browser page |
page.press | press(selector, key) or press({ key[, selector] }) | Browser page |
page.selectOption | selectOption(selector, value) or selectOption({ selector, value }) | Browser page |
page.setChecked | setChecked(selector[, checked]) or setChecked({ selector, checked }) | Browser page |
goto returns at the load event (a fast snapshot). When a page’s content is
still loading (rendered by post-load JS), call waitForState("networkidle")
before reading. waitForState’s state accepts "load",
"domcontentloaded", "networkalmostidle", "networkidle", or "done".
goto’s timeout defaults to 10000 ms; the waitFor* timeouts default to
5000 ms.
The [, { … }] is an optional trailing options object: leading arguments are
positional (waitForSelector("#row", { timeout: 2000 })), and the options ride
in a final object. Passing a single object with everything
(waitForSelector({ selector: "#row", timeout: 2000 })) is equivalent; that’s
the shape /save records into saved scripts. An option can’t be a bare
positional, though: waitForSelector("#row", 2000) is an error. A null
positional omits that field (press(null, "Enter") presses on the focused
element), and setting the same field positionally and in the options object
(goto(url, { url: ... })) is an invalid arguments error.
Script primitives address elements by CSS selector only. The tools that hand
out backendNodeIds (tree, findElement, nodeDetails) aren’t installed in
the script context, and a raw node ID wouldn’t survive replay anyway. When
you’re exploring in the REPL and have a backendNodeId (e.g. the leading
number on a /tree line, or a /findElement hit) run /nodeDetails backendNodeId=<id> to get a durable CSS selector, then paste that into your
script.
Navigation
Use goto(...) to navigate to an URL:
const page = new Page();
await page.goto("https://example.com");
await page.goto({
url: "https://example.com/app",
timeout: 15000
});The call returns a status string and throws if navigation fails. A timeout
does not throw: the call returns "Navigation started but the page did not finish loading before the timeout." and the page stays in whatever state it
reached. Check the return value, or follow with waitForState(...) /
waitForSelector(...), when completeness matters.
Structured Extraction
Use page.extract(...) to read data from the current page without writing page-side
JavaScript. This is the preferred bridge from page content into local agent
logic.
const result = page.extract({
heading: "h1",
links: [{
selector: "a",
limit: 10,
fields: {
text: "",
href: { attr: "href" }
}
}]
});The schema forms are:
| Schema value | Meaning |
|---|---|
"<selector>" | Text of the first matching element, or null |
"" | Text of the current matched element inside a fields block |
["<selector>"] | Text of all matching elements |
{ selector: "<selector>", attr: "<name>" } | Attribute from the first match |
[{ selector: "<selector>", attr: "<name>" }] | Attribute from all matches |
[{ selector: "<selector>", fields: { ... } }] | Array of records, with fields resolved relative to each matched element |
limit: N | Cap array extraction to N matches |
Return shape follows the top-level schema:
extract({ title: "h1" })returns{ title: "..." }.extract({ title: "h1", links: [{ selector: "a" }] })returns an object with both fields.extract({ links: [{ selector: "a" }] })returns{ links: [...] }. An object schema always returns an object, even with a single field.extract([{ selector: "a" }])is shorthand for a single anonymous array extraction and returns the array directly.
Every value is a string (trimmed text or a raw attribute) or null; parse
numbers in script logic. An array field that matches nothing yields []
without complaint (a page with zero comments is a valid result), but if
every field in the schema misses, extract(...) throws
no schema selector matched any element; treat that as “my selectors are
wrong”, not “the page is empty”.
extract(...) reads only the current page. For list-to-detail scraping:
capture the list, then loop in the script (goto each row’s URL and extract
the detail). The local agent context keeps the data across navigations, so the
assembly happens in plain JavaScript. See the complete
example below.
When passing an object directly to extract(...), the runtime serializes it as
the extractor schema. These forms are equivalent:
page.extract({ title: "h1" });
page.extract({ schema: { title: "h1" } });
page.extract('{ "title": "h1" }');The wrapped form accepts only schema: the REPL’s save= option does not
exist in scripts (extract({ schema: ..., save: ... }) is rejected). Keep
results in local variables instead.
Use local variables to keep extracted data available to later script logic:
const data = page.extract({ title: "title" });Page JavaScript
evaluate(...) is the explicit escape hatch into the current page’s JavaScript
context. Its script string runs where window and document exist.
await page.goto("https://example.com");
const title = page.evaluate("document.title");
console.log(title);Keep the boundary clear:
const selector = "h1";
// Good: local agent logic builds an extract schema.
const data = page.extract({ heading: selector });
// Bad: page evaluate cannot see local agent variables.
page.evaluate("document.querySelector(selector).textContent");Page evaluate(...) cannot call goto, extract, or other agent primitives.
Agent scripts cannot access document directly. If you need page DOM data,
prefer extract(...); use evaluate(...) only for page behavior that extraction
cannot express.
waitForScript(...) also evaluates in the page context, repeatedly, until the
expression is truthy or the timeout expires:
page.waitForScript("document.querySelectorAll('.row').length >= 5");Interaction Primitives
The action primitives operate on the current page. Most take one object whose fields match the browser tool schema:
page.click({ selector: "a.login" });
page.fill({ selector: "input[name='acct']", value: "$LP_HN_USERNAME" });
page.fill({ selector: "input[name='pw']", value: "$LP_HN_PASSWORD" });
page.press({ key: "Enter" });
page.waitForSelector("#logout");
page.hover({ selector: "#menu" });
page.selectOption({ selector: "select[name='country']", value: "FR" });
page.setChecked({ selector: "input[name='terms']", checked: true });
page.setChecked({ selector: "input[name='newsletter']", checked: false });
page.scroll({ y: 600 });
page.scroll();setChecked defaults checked to true when the field is omitted
(setChecked("#chk") checks the box). press’s leading positional is the
optional selector, not key: a bare press("Enter") binds "Enter" to
selector and fails. Press on the focused element with
press({ key: "Enter" }) or press(null, "Enter"); target an element with
press("#search", "Enter") or press({ key: "Enter", selector: "#search" }).
$LP_* placeholders in string arguments are resolved inside the Lightpanda
process. This keeps credentials out of recorded scripts and LLM prompts. In
recordings, resolved LP_* values are scrubbed back to placeholders.
Error Handling
Primitive failures throw JavaScript exceptions:
try {
page.waitForSelector({ selector: "#dashboard", timeout: 1000 });
} catch (err) {
console.error("dashboard did not appear:", err.message);
throw err;
}Common failures:
| Error | Meaning |
|---|---|
ReferenceError: document is not defined | You tried to use browser DOM APIs in the agent context. Use extract(...) or page evaluate(...). |
ReferenceError: require is not defined | Agent scripts are not Node.js scripts. |
no page loaded - run goto(url) first | A page-dependent primitive ran before navigation. |
invalid arguments | A primitive received the wrong number or shape of arguments, or a non-JSON-serializable value. |
extract: no schema selector matched any element | Every field in the schema missed. Fix the selectors; an empty page section yields null/[] per field, not this error. |
Complete Example
This script opens Hacker News, extracts five stories, visits each comments page, and prints one JSON object. The looping and object assembly happen in the local agent script, not in the page.
const HN = "https://news.ycombinator.com";
const page = new Page();
await page.goto(HN);
const { stories } = page.extract({
stories: [{
selector: "tr.athing",
limit: 5,
fields: {
id: { attr: "id" },
title: ".titleline > a",
url: { selector: ".titleline > a", attr: "href" }
}
}]
});
for (const story of stories) {
story.comments = [];
if (!story.id) continue;
await page.goto(`${HN}/item?id=${story.id}`);
const { comments } = page.extract({
comments: [{
selector: "tr.athing.comtr:has(.commtext)",
limit: 3,
fields: {
author: ".hnuser",
text: ".commtext"
}
}]
});
story.comments = comments;
}
stories; // printed automatically as JSON