Interact with a webpage

In this guide, you’ll use Lightpanda to run a search on HackerNews , wait for dynamic results to load, and extract structured data from the page.

This example requires Lightpanda because HackerNews uses XHR requests to display search results — raw HTML fetching wouldn’t capture them.

Prerequisites

You’ll need Node.js installed on your computer.

Create a hn-scraper directory and initialize a new Node.js project. You can accept all the default values in the npm init prompts.


mkdir hn-scraper && \
  cd hn-scraper && \
  npm init

Install the lightpanda and the puppeteer-core or playwright-core npm package.

Unlike puppeteer and playwright, puppeteer-core and playwright-core don’t download a Chromium browser.

puppeteer


npm install --save puppeteer-core @lightpanda/browser

This guide uses a local Lightpanda instance — see installation for setup details. The code can be adapted to connect to Lightpanda Cloud.

Create index.js

Create an index.js file. Import the packages, configure the connection, and set up the browser lifecycle:

puppeteer


'use strict'
 
import { lightpanda } from '@lightpanda/browser';
import puppeteer from 'puppeteer-core';
 
const lpdopts = {
  host: '127.0.0.1',
  port: 9222,
};
 
const puppeteeropts = {
  browserWSEndpoint: 'ws://' + lpdopts.host + ':' + lpdopts.port,
};
 
(async () => {
  // Start Lightpanda browser in a separate process.
  const proc = await lightpanda.serve(lpdopts);
 
  // Connect Puppeteer to the browser.
  const browser = await puppeteer.connect(puppeteeropts);
  const context = await browser.createBrowserContext();
  const page = await context.newPage();
 
  // Disconnect Puppeteer.
  await page.close();
  await context.close();
  await browser.disconnect();
 
  // Stop Lightpanda browser process.
  proc.stdout.destroy();
  proc.stderr.destroy();
  proc.kill();
})();

Navigate and search

Navigate to HackerNews, type a search term, and press Enter:

puppeteer


  await page.goto("https://news.ycombinator.com/");
 
  await page.type('input[name="q"]', 'lightpanda');
  await page.keyboard.press('Enter');

Wait for results

Wait for the search results to appear, with a 5 second timeout:

puppeteer


  await page.waitForFunction(() => {
    return document.querySelector('.Story_container') != null;
  }, { timeout: 5000 });

Extract the data

Loop over the results to extract the title, URL, and metadata for each story:

puppeteer


  const res = await page.evaluate(() => {
    return Array.from(document.querySelectorAll('.Story_container')).map(row => {
      return {
        title: row.querySelector('.Story_title span').textContent,
        url: row.querySelector('.Story_title a').getAttribute('href'),
        meta: Array.from(row.querySelectorAll('.Story_meta > span:not(.Story_separator, .Story_comment)')).map(row => {
          return row.textContent;
        }),
      }
    });
  });
 
  console.log(res);

Full script

puppeteer


'use strict'
 
import { lightpanda } from '@lightpanda/browser';
import puppeteer from 'puppeteer-core';
 
const lpdopts = {
  host: '127.0.0.1',
  port: 9222,
};
 
const puppeteeropts = {
  browserWSEndpoint: 'ws://' + lpdopts.host + ':' + lpdopts.port,
};
 
(async () => {
  // Start Lightpanda browser in a separate process.
  const proc = await lightpanda.serve(lpdopts);
 
  // Connect Puppeteer to the browser.
  const browser = await puppeteer.connect(puppeteeropts);
  const context = await browser.createBrowserContext();
  const page = await context.newPage();
 
  // Go to hackernews home page.
  await page.goto("https://news.ycombinator.com/");
 
  // Type the search term and submit.
  await page.type('input[name="q"]', 'lightpanda');
  await page.keyboard.press('Enter');
 
  // Wait until the search results are loaded.
  await page.waitForFunction(() => {
    return document.querySelector('.Story_container') != null;
  }, { timeout: 5000 });
 
  // Loop over search results to extract data.
  const res = await page.evaluate(() => {
    return Array.from(document.querySelectorAll('.Story_container')).map(row => {
      return {
        title: row.querySelector('.Story_title span').textContent,
        url: row.querySelector('.Story_title a').getAttribute('href'),
        meta: Array.from(row.querySelectorAll('.Story_meta > span:not(.Story_separator, .Story_comment)')).map(row => {
          return row.textContent;
        }),
      }
    });
  });
 
  console.log(res);
 
  // Disconnect Puppeteer.
  await page.close();
  await context.close();
  await browser.disconnect();
 
  // Stop Lightpanda browser process.
  proc.stdout.destroy();
  proc.stderr.destroy();
  proc.kill();
})();

Run it


node index.js


$ node index.js
🐼 Running Lightpanda's CDP server… { pid: 598201 }
[
  {
    title: 'Show HN: Lightpanda, an open-source headless browser in Zig',
    url: 'https://news.ycombinator.com/item?id=42817439',
    meta: [ '319 points', 'fbouvier', '9 months ago', '137 comments' ]
  },
  {
    title: 'Lightpanda: Headless browser designed for AI and automation',
    url: 'https://news.ycombinator.com/item?id=42812859',
    meta: [ '154 points', 'tosh', '9 months ago', '1 comments' ]
  },
  ...
]