Directory Datasets
8 min read

Pay-per-result vs subscription pricing for data scraping: a buyer's guide

A neutral, side-by-side breakdown of the four major pricing models for scraped data — pay-per-result, compute units, monthly subscription, custom enterprise — with the math for when each is the right choice.

Pay-per-result (PPR, also called PPE — "pay per event") pricing charges you a flat fee for each valid data record. Subscription pricing charges you a fixed monthly fee for access. Compute-unit pricing charges you for the time and memory the scraper used. Custom enterprise pricing is whatever the sales team negotiated.

If you've ever stared at an Apify, Bright Data, or Octoparse pricing page and felt like you were reading a tax form, this post is the neutral side-by-side breakdown — with the actual math for when each model makes you money and when it makes you lose money.

The four pricing models, in one table

| Model | Charged for | Predictable? | Best for | Worst for | |-------|-------------|--------------|----------|-----------| | Pay-per-result (PPR) | Each valid record extracted | Highly — you know cost per record before the run | Buyers who need bounded, predictable cost on production datasets | Buyers who run lots of small ad-hoc explorations | | Compute units (CU) | Wall-clock × memory used | Low — you don't know cost until the run finishes | Sophisticated buyers running custom Actors with known stable workloads | Anyone whose scraper performance varies with the target site | | Subscription | Fixed monthly fee | Capped — you know the ceiling, not the unit economics | High-volume buyers who pull the same data continuously | Buyers with seasonal or one-off needs | | Custom enterprise | Whatever the contract says | Depends on contract | Large organizations with legal/compliance constraints | Anyone who values transparency or speed of procurement |

How pay-per-result actually works

The PPR / PPE pricing model became Apify's default in 2024 and has since been adopted by most scraping marketplaces. Here's the mechanic:

  1. The Actor publisher sets a price per output record — e.g., $0.005 per agency profile (the Agency Vista price), or $0.003 per job posting (the OnlineJobs.ph price).
  2. When you run the Actor, you specify a maxItems cap or let the Actor terminate naturally on the input range you provided.
  3. You're charged only for records that were actually pushed to the dataset.
  4. If the publisher has wired in schema validation (which good ones do), records that fail validation are dropped without billing you.

The buyer's mental model is dead simple: "I want 10,000 records, that's $50, here's my card." No CU math, no time-based surprises, no metering.

Where pay-per-result becomes a bad deal

PPR is genuinely the right model in most cases, but it's not always:

  • You're doing exploratory data work. If you're going to run 50 small experiments to figure out what data shape you actually need, PPR's per-record economics work against you. Compute-unit pricing or a free tier suits you better.
  • You're scraping low-value, high-volume data. PPR usually starts at fractions of a cent per record. For datasets where you need millions of records and your downstream value-per-record is in tenths of a cent (think SEO bulk-data), the math gets close.
  • The Actor's PPR price is set high to subsidize a bad scraper. Some publishers price PPR at $0.05+ per record on data that should be $0.001 — the price reflects compute cost, not value. Compare 2-3 Actors in the same category before committing.

Where pay-per-result is unambiguously the right model

PPR wins clearly when:

  • Your downstream economics are per-record. You're feeding a CRM. Each record has a known value to your business (e.g., "1 lead = $5 of expected revenue"). PPR maps directly: cost-per-lead = price-per-record. No translation layer.
  • You need bounded predictability. Your CFO wants to know "if I extract 50K agencies, what's the cost ceiling?" PPR answers that in 10 seconds. CU pricing requires "well, depends on the runtime…"
  • Schema-validated PPR is offered. This is the killer feature: when the publisher only charges for records that pass schema validation, you've eliminated the failure mode where you pay for half-broken records.

How compute-unit pricing actually works

Compute-unit (CU) pricing charges you for (wall-clock seconds × memory MB) consumed by the scraper. A typical Apify CU rate (when CU is being used) is roughly $0.25 per CU, where 1 CU = ~1 GB × ~1 hour.

The math gets opaque fast. To estimate cost on a CU-priced Actor, you need to know:

  • How long the scraper takes per record (varies with target-site performance)
  • How much memory it uses (varies with worker model — Cheerio vs Playwright differs by 10×)
  • How concurrency settings interact with rate limits
  • Whether retries multiply your costs

For a buyer just trying to extract 47K agency profiles, this is a tax on cognition. The scraper you ran yesterday may cost 2× tomorrow because the upstream site got slower. PPR puts that risk on the publisher; CU puts it on you.

Monthly subscriptions: when do they win?

Subscription scraping tools (Bright Data, Octoparse Premium, Apify subscription plans) charge a fixed monthly fee — typically $50-$1,000+/month — and bundle compute, proxies, and concurrency into the price.

The math works when:

  • You're pulling the same data continuously and at high volume — daily refreshes of large datasets.
  • Your usage stays consistent month-over-month (no seasonal spikes).
  • You value the bundled support, SLA, and infrastructure stability over per-unit transparency.

The math breaks when:

  • You only need data occasionally. Paying $200/month for 5 runs is $40 per run.
  • Your usage is bursty. Subscription plans typically meter on top of the base fee, so a busy month becomes "subscription + overages".
  • You can't predict next month's volume, so you over-provision to be safe.

For most directory-data buyers, subscriptions are over-provisioned. PPR wins on cash flow and on sleeping at night.

Compare like-for-like on these three numbers

When you're comparing two Actors that promise the same data, ignore the marketing copy and compare:

  1. Effective price per validated record. Total cost ÷ records that passed schema validation. Not records pushed; records you would actually use.
  2. Schema-validation pass rate. Ask the publisher (or run a sample). 99%+ is the bar. Below 95% means you're paying for garbage even on PPR.
  3. Drift detection cadence. "Updated weekly" means nothing. Ask whether the publisher runs synthetic tests on a schedule and whether they publish a status page or changelog.

Two Actors at the same nominal $0.005/record can differ by 3× on effective cost-per-usable-record once schema-validation rates and drift handling are factored in.

A worked example: 50K marketing agency profiles

Comparing four real-world ways to get the same dataset:

| Approach | Effective cost | Time to result | Risk profile | |----------|---------------|----------------|--------------| | Build it yourself (one engineer, ~2 weeks) | $8,000-15,000 in eng time | 2-3 weeks | High — you own the maintenance, drift, and proxies | | Subscription scraper service ($499/month base + ~50K CU usage) | $500-900 for one month | ~3 days to onboard | Medium — vendor dependency, SLA usually 99% | | PPR Actor at $0.005/record | $250 (50,000 × $0.005) | ~30 minutes to result | Low — capped cost, schema-validated output, easy to compare alternatives | | Custom enterprise contract | $5,000-50,000 + legal review | 4-12 weeks | Low for the data, high for procurement friction |

For a one-shot pull of 50K agency profiles, PPR wins on every axis except for buyers who genuinely need procurement-grade contracts.

A buyer's checklist for any scraped-data purchase

Before you run an Actor or sign a subscription, get clear answers on:

  • [ ] What's the price per validated record (not raw record)?
  • [ ] What schema does the output conform to? Is it documented?
  • [ ] What's the schema validation pass rate on a recent run?
  • [ ] How does the publisher detect upstream drift, and how fast do they fix it?
  • [ ] Is there a sample dataset you can preview before paying?
  • [ ] What output formats are supported (JSON, CSV, HTML preview)?
  • [ ] What's the publisher's track record on the source site (months of stable runs)?

Anything you can't answer is a risk you're carrying.


Directory Datasets uses pay-per-result pricing on every Actor, with schema validation gating every record before it bills you. Agency Vista is $0.005 per validated agency profile. OnlineJobs.ph is $0.003 per validated job posting, with a $0.005 delta-mode rate when you only want net-new postings since the last run.