Interact with a webpage
In this guide, you’ll use Lightpanda to run a search on HackerNews , wait for dynamic results to load, and extract structured data from the page.
This example requires Lightpanda because HackerNews uses XHR requests to display search results — raw HTML fetching wouldn’t capture them.
Prerequisites
You’ll need Node.js installed on your computer.
Create a hn-scraper directory and initialize a new Node.js project. You can accept all the default values in the npm init prompts.
mkdir hn-scraper && \
cd hn-scraper && \
npm initInstall the lightpanda and the puppeteer-core or playwright-core npm package.
Unlike puppeteer and playwright, puppeteer-core and playwright-core don’t download a Chromium browser.
puppeteer
npm install --save puppeteer-core @lightpanda/browserThis guide uses a local Lightpanda instance — see installation for setup details. The code can be adapted to connect to Lightpanda Cloud.
Create index.js
Create an index.js file. Import the packages, configure the connection, and set up the browser lifecycle:
puppeteer
'use strict'
import { lightpanda } from '@lightpanda/browser';
import puppeteer from 'puppeteer-core';
const lpdopts = {
host: '127.0.0.1',
port: 9222,
};
const puppeteeropts = {
browserWSEndpoint: 'ws://' + lpdopts.host + ':' + lpdopts.port,
};
(async () => {
// Start Lightpanda browser in a separate process.
const proc = await lightpanda.serve(lpdopts);
// Connect Puppeteer to the browser.
const browser = await puppeteer.connect(puppeteeropts);
const context = await browser.createBrowserContext();
const page = await context.newPage();
// Disconnect Puppeteer.
await page.close();
await context.close();
await browser.disconnect();
// Stop Lightpanda browser process.
proc.stdout.destroy();
proc.stderr.destroy();
proc.kill();
})();Navigate and search
Navigate to HackerNews, type a search term, and press Enter:
puppeteer
await page.goto("https://news.ycombinator.com/");
await page.type('input[name="q"]', 'lightpanda');
await page.keyboard.press('Enter');Wait for results
Wait for the search results to appear, with a 5 second timeout:
puppeteer
await page.waitForFunction(() => {
return document.querySelector('.Story_container') != null;
}, { timeout: 5000 });Extract the data
Loop over the results to extract the title, URL, and metadata for each story:
puppeteer
const res = await page.evaluate(() => {
return Array.from(document.querySelectorAll('.Story_container')).map(row => {
return {
title: row.querySelector('.Story_title span').textContent,
url: row.querySelector('.Story_title a').getAttribute('href'),
meta: Array.from(row.querySelectorAll('.Story_meta > span:not(.Story_separator, .Story_comment)')).map(row => {
return row.textContent;
}),
}
});
});
console.log(res);Full script
puppeteer
'use strict'
import { lightpanda } from '@lightpanda/browser';
import puppeteer from 'puppeteer-core';
const lpdopts = {
host: '127.0.0.1',
port: 9222,
};
const puppeteeropts = {
browserWSEndpoint: 'ws://' + lpdopts.host + ':' + lpdopts.port,
};
(async () => {
// Start Lightpanda browser in a separate process.
const proc = await lightpanda.serve(lpdopts);
// Connect Puppeteer to the browser.
const browser = await puppeteer.connect(puppeteeropts);
const context = await browser.createBrowserContext();
const page = await context.newPage();
// Go to hackernews home page.
await page.goto("https://news.ycombinator.com/");
// Type the search term and submit.
await page.type('input[name="q"]', 'lightpanda');
await page.keyboard.press('Enter');
// Wait until the search results are loaded.
await page.waitForFunction(() => {
return document.querySelector('.Story_container') != null;
}, { timeout: 5000 });
// Loop over search results to extract data.
const res = await page.evaluate(() => {
return Array.from(document.querySelectorAll('.Story_container')).map(row => {
return {
title: row.querySelector('.Story_title span').textContent,
url: row.querySelector('.Story_title a').getAttribute('href'),
meta: Array.from(row.querySelectorAll('.Story_meta > span:not(.Story_separator, .Story_comment)')).map(row => {
return row.textContent;
}),
}
});
});
console.log(res);
// Disconnect Puppeteer.
await page.close();
await context.close();
await browser.disconnect();
// Stop Lightpanda browser process.
proc.stdout.destroy();
proc.stderr.destroy();
proc.kill();
})();Run it
node index.js$ node index.js
🐼 Running Lightpanda's CDP server… { pid: 598201 }
[
{
title: 'Show HN: Lightpanda, an open-source headless browser in Zig',
url: 'https://news.ycombinator.com/item?id=42817439',
meta: [ '319 points', 'fbouvier', '9 months ago', '137 comments' ]
},
{
title: 'Lightpanda: Headless browser designed for AI and automation',
url: 'https://news.ycombinator.com/item?id=42812859',
meta: [ '154 points', 'tosh', '9 months ago', '1 comments' ]
},
...
]