Scrape the modern web

Lightpanda is the open-source browser made for headless usage.
Fast scraping and web automation with minimal memory footprint.

Execution time ~ 64 times faster
8.6ms
556.7ms
Memory peak ~ 12 times less memory used
14MB
166MB
CLI exec and dump of a mock local webpage on a AWS EC2 m5.large instance.
See benchmark details

Featured in Lightpanda

Ultra-low memory footprint

Blazingly Fast
Instant startup

Keep your daily tools
Playwright, Puppeteer

Preview demo

On this mock e-commerce webpage, important information such as product name, price, features, and reviews are fetched through XHR requests.


We’ve put it through tests locally using 3 different tools:

  • cURL, not able to execute Javascript and retrieve the data
  • Chrome headless, accurate data but slow and fat (180MB RAM)
  • Lightpanda, same output than Chrome but 60x faster while using 12x less memory
Javascript execution  is mandatory for the modern web - illustration

Javascript execution
is mandatory for the modern web

Back in the good old times, grabbing a webpage was as easy as making an HTTP request, cURL-like. It’s not possible anymore, because Javascript is everywhere, like it or not.

  • Ajax, Single Page App, Infinite loading, “click to display”, instant search, etc.
  • JS web frameworks: React, Vue, Angular & others
Chrome is not the right tool - illustration

Chrome
is not the right tool

So if we need Javascript, why not use a real web browser. Let’s take a huge desktop application, hack it, and run it on the server, right? Hundreds of instance of Chrome if you use it at scale. Are you sure it’s such a good idea?

  • Heavy on RAM and CPU, expensive to run
  • Hard to package, deploy and maintain at scale
  • Bloated, lots of features are not useful in headless usage
Lightpandais built for performance - illustration

Lightpanda
is built for performance

If we want both Javascript and performance, for a real headless browser, we need to start from scratch. Not yet another iteration of Chromium, really from a blank page. Crazy right? But that’s we did: enters Lightpanda.

  • Not based on Chromium, Blink or WebKit
  • Low-level system programming language (Zig) with optimisations in mind
  • Opinionated, no rendering

Timeline

Q2 2022
 —  Beginning of the project
2022-2023
 —  Development phase
Feb 2024
 —  Private Alpha release
Q2 2024
 —  OpenSource and public beta
Q4 2024
 —  Cloud version

Frequently Asked Questions

  • Can you explain the benchmarks? Is it a fair comparison?

    We all love benchmarks but we also know how difficult it can be to have a fair comparison. That is why it was very important for us to be transparent about our protocol for the benchmark.

    What are we testing?
    A mock e-commerce web page making an XHR call with a JSON list of products and reviews, updating the DOM with the fetched data. The code is available on the Github repository .

    How are we testing it?
    By launching both binaries (Google Chrome v122 and Lightpanda) and dumping the the result web page (ie. with the DOM modifications).

    What metrics are you looking for?
    Execution time and peaked memory.

    What tools are you using?
    We use Hyperfine to measure the duration of the execution, which allows us to launch the tests multiple times and therefore reduce the impact of warmup and remove the best/worse iterations. For peaked memory we used GNU Time with -v option.

    It is a pretty basic benchmark and we intend to improve it as Lightpanda grows, adding new metrics and protocols (i.e: testing parallel executions in server mode).

  • Is it open source? When will it be available?

    Yes - Lightpanda is going to be open-source when we release publicly our Github repository. The license has not been chosen yet, but it will definitely be an OSI approved one.

    We will release the public beta when we make Lightpanda ready to handle some real-life scenarios on hand-picked web pages. Hopefully late Q2 2024 (see timeline ).

    While making Lightpanda we have developed bespoke libraries that we are going to release in the coming weeks.

  • What libraries are you using?

    For Javascript execution we use the v8 engine (Chromium, Node) for state of the art performance and compatibility.

    We plan to add a lighter non-JIT Javascript engine (probably Fabrice Bellard’s QuickJS ) as an alternative, to reduce drastically binary size and memory usage. It will also allow embedding scenarios (as a WASM module, a lib and so on).

    For HTML parsing and DOM tree manipulation we use libhubbub and libdom from the Netsurf project.

    Our I/O event loop is based on the Tigerbeetle one.

  • Who is behind Lightpanda?

    We are a young company based in Paris, co-founded by Francis Bouvier and Pierre Tachoire. We have been working on this project for the past 2 years.

    Francis is a software developer and entrepreneur, former CTO and co-founder of an e-commerce startup (BlueBoard, sold in 2020 to ChannelAdvisor, NYSE: ECOM).

    Pierre is a software developer, former software engineer at BlueBoard and ChannelAdvisor.

  • Are you going to have a commercial offer?

    Yes - our plan is to have cloud-based and on-premise versions of Lightpanda with support, SLAs and additional tools.