Skip to content
Directory Datasets
8 min read

Cheerio vs Playwright: which should you use for web scraping in 2026?

A benchmark-driven guide to choosing between Cheerio and Playwright. Real numbers — 10× memory reduction, 3min → 14sec — and a decision tree you can use today.

Cheerio is a server-side HTML parser. Playwright is a full headless browser automation framework. In 2026 they're often discussed as if they're competing for the same job. They aren't — and using the wrong one for the wrong site is the single most expensive mistake in production scraping.

This post is the benchmark-backed guide to choosing between them, with real numbers from a 47K-record directory scraper, the four-question decision tree we use, and the cases where you genuinely need a browser.

How big is the cost difference between Cheerio and Playwright?

When the data you need is already in the HTML response, switching from Playwright to Cheerio gives you:

MetricPlaywrightCheerioImprovement
Wall-clock execution3 min 0 sec14 sec~13× faster
Memory per worker~500 MB~50 MB10× lower
Concurrent workers per CPU2-320-30~10× higher
Cold-start time800-2,000 ms~5 ms~400× faster
Cost per 10K records~$0.50-$1.20~$0.05-$0.15~10× cheaper

Source: Apify Engineering ran this exact migration on their Switching from Playwright to Cheerio post; we see the same numbers on our Agency Vista Actor.

The reason isn't subtle: Playwright launches a Chromium binary, evaluates JavaScript, paints DOM, and runs a CDP connection. Cheerio reads bytes and parses them. If you don't need a browser, a browser is the dominant cost.

When do you actually need Playwright?

Despite the numbers above, Playwright is the right answer when any of these are true:

  • The data you need is rendered by client-side JavaScript and not present in the initial HTML response.
  • You need to interact with the page: click, type, scroll, wait for XHR, dismiss modals.
  • The site requires a logged-in session that depends on a JavaScript-set cookie (e.g., LinkedIn, most SaaS dashboards).
  • The site fingerprints requests using TLS, JS challenges, or canvas-based bot tests that only a real browser engine passes.
  • The data you need only loads after a setTimeout or an IntersectionObserver triggers.

For all of those, Cheerio is the wrong tool. Playwright (or Puppeteer, or a managed browser like Browserbase) is your only option.

When does Cheerio win by an order of magnitude?

Cheerio wins when the data is already in the HTML you got back. This is more common than people assume in 2026:

  • Next.js, Nuxt, SvelteKit, Astro, Remix — every modern JS framework ships server-rendered HTML by default. Most of the data is in the initial response or in a __NEXT_DATA__ / __NUXT_DATA__ / __sveltekit_data script tag.
  • Most directories, marketplaces, and job boards — agency directories, license boards, niche job sites, and review platforms are overwhelmingly server-rendered.
  • Sitemap-driven extraction — when you're crawling from a sitemap of detail-page URLs, every page is a single HTTP GET. No interaction needed.
  • Paginated JSON list pages — many sites embed their full agency/job/profile list as serialized JSON in a script tag for SSR hydration. You're reading JSON out of the HTML, not parsing HTML.

The Agency Vista scraper that powers our Agency Vista dataset is a textbook example: 47,000+ profiles, all server-rendered Next.js, every page's data sitting in __NEXT_DATA__. A CheerioCrawler + a JSON parse is the entire extraction pipeline.

What's the four-question decision tree?

When you're scoping a new scraper, ask these in order:

1. Is there a public API or a hidden internal API?
   YES → Use neither Cheerio nor Playwright. Hit the API directly.
   NO  → Continue.

2. View source on a target page. Is the data you need in the HTML?
   YES → Cheerio.
   NO  → Continue.

3. Is the data in a __NEXT_DATA__ / __NUXT_DATA__ /
   inline-JSON script tag in the HTML?
   YES → Cheerio. Parse the script tag content as JSON.
   NO  → Continue.

4. Does the data appear after JS execution, scrolling, or interaction?
   YES → Playwright.

Most scrapers stop at step 2 or step 3. Production teams that default-to-Playwright are paying 10× cost for nothing.

How do I run the "view source" test in 30 seconds?

The cheap way to know which one you need: open the target page in your browser, hit Ctrl+U (or Cmd+Option+U), and search the raw HTML for a known data point — an agency name, a job title, a license number you can see on the rendered page.

  • If Ctrl+U shows your data → Cheerio.
  • If Ctrl+U does not show your data, but the data is visible in the rendered page → Playwright (or check the Network tab for an XHR endpoint that returns it as JSON, in which case: hit the API directly).

This 30-second test will save you days of building the wrong stack.

How does Crawlee let you switch between the two?

If you're using Crawlee (the open-source crawler library that powers Apify Actors), switching between the two is a one-import change:

// Cheerio version
import { CheerioCrawler } from 'crawlee';
 
const crawler = new CheerioCrawler({
  async requestHandler({ $, request }) {
    const data = JSON.parse($('script#__NEXT_DATA__').first().text());
    // ...
  },
});
// Playwright version (same handler shape)
import { PlaywrightCrawler } from 'crawlee';
 
const crawler = new PlaywrightCrawler({
  async requestHandler({ page, request }) {
    await page.waitForSelector('.profile-name');
    const name = await page.locator('.profile-name').textContent();
    // ...
  },
});

Same Crawlee API, same request queueing, same retry/concurrency knobs. The framework is the same; only the worker model changes. That makes it cheap to start with Cheerio and add a Playwright fallback for the 10% of pages that need it.

Why is defaulting to Playwright "just in case" expensive?

In 2026, AI-coding assistants overwhelmingly suggest Playwright as the default for any scraping task. They'll write you a 200-line Playwright script for a job that needs fetch() + 3 lines of Cheerio.

The cost shows up later, in three places:

  1. Apify Compute Units, AWS bills, or Browserbase invoices — 10× the wall-clock and 10× the memory means 10× the cost.
  2. Concurrency ceiling — 30 concurrent Cheerio workers fit on a small VPS. 30 concurrent Playwright workers need a serious machine.
  3. Failure surface — Chromium occasionally hangs, segfaults, or leaks memory under load. Cheerio is a pure function: input HTML → output object. Less can go wrong.

The default-to-Playwright instinct is wrong in 2026 the same way default-to-jQuery was wrong in 2018. Use the tool the job needs, not the most general tool that happens to work.

What's the 60-second rule of thumb?

If your data is in the HTML, use Cheerio. If your data is rendered by JavaScript, use Playwright. Decide which one by viewing source on a target page.

That's the entire decision in one sentence. Everything else is implementation detail.

Frequently asked questions

Is Cheerio faster than Playwright in every case?

Yes, when the data is in the HTML response. Roughly 10-13× faster wall-clock and 10× lower memory in benchmarks where both can extract the same data. Playwright is faster than Cheerio in zero realistic scraping scenarios — Playwright wins on capability (can interact with JS), not speed.

Does using Cheerio mean I can't handle JavaScript-heavy sites?

You can't execute JavaScript with Cheerio, but you often don't need to. Most modern frameworks (Next.js, Nuxt, SvelteKit, Astro, Remix) ship server-rendered HTML with their data inlined as JSON in script tags. Read the JSON, skip the JavaScript.

What about Puppeteer vs Playwright for headless scraping?

Playwright is the modern default — multi-browser support (Chromium, Firefox, WebKit), better selector engine, better wait primitives. Puppeteer is Chrome-only and largely in maintenance mode. If you need a headless browser, choose Playwright.

Does Cheerio support TypeScript?

Yes — Cheerio ships first-class TypeScript types. Crawlee's CheerioCrawler is fully typed and works seamlessly in strict TypeScript projects.

What's the cost difference at 1M records?

At ~$0.50/run for 10K Playwright records vs ~$0.05/run for 10K Cheerio records, scaling to 1M records is roughly $50 (Cheerio) vs $500 (Playwright). The 10× delta holds up.

Can I mix Cheerio and Playwright in the same scraper?

Yes — Crawlee supports adding both crawler types to the same Actor. A common pattern is Cheerio as the default, with a Playwright fallback for the 5-10% of pages that genuinely need browser execution.


The Directory Datasets Actors that power Agency Vista and OnlineJobs.ph are both Cheerio-based. We started both with Ctrl+U, found the data in the HTML, and never wrote a Playwright handler. Costs per 10K records sit in the cents-per-thousand range as a result — which is what makes pay-per-result pricing economically viable on directory data.