Skip to content
Directory Datasets
9 min read

Pay-per-result vs subscription pricing: which is better for scraped data?

A neutral, side-by-side breakdown of the four major pricing models for scraped data — pay-per-result, compute units, monthly subscription, custom enterprise — with the math for when each is the right choice.

Pay-per-result (PPR, also called PPE — "pay per event") pricing charges you a flat fee for each valid data record. Subscription pricing charges you a fixed monthly fee for access. Compute-unit pricing charges you for the time and memory the scraper used. Custom enterprise pricing is whatever the sales team negotiated.

If you've ever stared at an Apify, Bright Data, or Octoparse pricing page and felt like you were reading a tax form, this post is the neutral side-by-side breakdown — with the actual math for when each model makes you money and when it makes you lose money.

What are the four pricing models for scraped data?

ModelCharged forPredictable?Best forWorst for
Pay-per-result (PPR)Each valid record extractedHighly — you know cost per record before the runBuyers who need bounded, predictable cost on production datasetsBuyers who run lots of small ad-hoc explorations
Compute units (CU)Wall-clock × memory usedLow — you don't know cost until the run finishesSophisticated buyers running custom Actors with known stable workloadsAnyone whose scraper performance varies with the target site
SubscriptionFixed monthly feeCapped — you know the ceiling, not the unit economicsHigh-volume buyers who pull the same data continuouslyBuyers with seasonal or one-off needs
Custom enterpriseWhatever the contract saysDepends on contractLarge organizations with legal/compliance constraintsAnyone who values transparency or speed of procurement

How does pay-per-result pricing actually work?

The PPR / PPE pricing model became Apify's default in 2024 and has since been adopted by most scraping marketplaces. Here's the mechanic:

  1. The Actor publisher sets a price per output record — e.g., $0.005 per agency profile (the Agency Vista price), or $0.003 per job posting (the OnlineJobs.ph price).
  2. When you run the Actor, you specify a maxItems cap or let the Actor terminate naturally on the input range you provided.
  3. You're charged only for records that were actually pushed to the dataset.
  4. If the publisher has wired in schema validation (which good ones do), records that fail validation are dropped without billing you.

The buyer's mental model is dead simple: "I want 10,000 records, that's $50, here's my card." No CU math, no time-based surprises, no metering.

When does pay-per-result become a bad deal?

PPR is genuinely the right model in most cases, but it's not always:

  • You're doing exploratory data work. If you're going to run 50 small experiments to figure out what data shape you actually need, PPR's per-record economics work against you. Compute-unit pricing or a free tier suits you better.
  • You're scraping low-value, high-volume data. PPR usually starts at fractions of a cent per record. For datasets where you need millions of records and your downstream value-per-record is in tenths of a cent (think SEO bulk-data), the math gets close.
  • The Actor's PPR price is set high to subsidize a bad scraper. Some publishers price PPR at $0.05+ per record on data that should be $0.001 — the price reflects compute cost, not value. Compare 2-3 Actors in the same category before committing.

When is pay-per-result unambiguously the right model?

PPR wins clearly when:

  • Your downstream economics are per-record. You're feeding a CRM. Each record has a known value to your business (e.g., "1 lead = $5 of expected revenue"). PPR maps directly: cost-per-lead = price-per-record. No translation layer.
  • You need bounded predictability. Your CFO wants to know "if I extract 50K agencies, what's the cost ceiling?" PPR answers that in 10 seconds. CU pricing requires "well, depends on the runtime…"
  • Schema-validated PPR is offered. This is the killer feature: when the publisher only charges for records that pass schema validation, you've eliminated the failure mode where you pay for half-broken records.

How does compute-unit pricing actually work?

Compute-unit (CU) pricing charges you for (wall-clock seconds × memory MB) consumed by the scraper. A typical Apify CU rate (when CU is being used) is roughly $0.25 per CU, where 1 CU = ~1 GB × ~1 hour.

The math gets opaque fast. To estimate cost on a CU-priced Actor, you need to know:

  • How long the scraper takes per record (varies with target-site performance)
  • How much memory it uses (varies with worker model — Cheerio vs Playwright differs by 10×)
  • How concurrency settings interact with rate limits
  • Whether retries multiply your costs

For a buyer just trying to extract 47K agency profiles, this is a tax on cognition. The scraper you ran yesterday may cost 2× tomorrow because the upstream site got slower. PPR puts that risk on the publisher; CU puts it on you.

When do monthly subscriptions win?

Subscription scraping tools (Bright Data, Octoparse Premium, Apify subscription plans) charge a fixed monthly fee — typically $50-$1,000+/month — and bundle compute, proxies, and concurrency into the price.

The math works when:

  • You're pulling the same data continuously and at high volume — daily refreshes of large datasets.
  • Your usage stays consistent month-over-month (no seasonal spikes).
  • You value the bundled support, SLA, and infrastructure stability over per-unit transparency.

The math breaks when:

  • You only need data occasionally. Paying $200/month for 5 runs is $40 per run.
  • Your usage is bursty. Subscription plans typically meter on top of the base fee, so a busy month becomes "subscription + overages".
  • You can't predict next month's volume, so you over-provision to be safe.

For most directory-data buyers, subscriptions are over-provisioned. PPR wins on cash flow and on sleeping at night.

What three numbers should you compare on every Actor?

When you're comparing two Actors that promise the same data, ignore the marketing copy and compare:

  1. Effective price per validated record. Total cost ÷ records that passed schema validation. Not records pushed; records you would actually use.
  2. Schema-validation pass rate. Ask the publisher (or run a sample). 99%+ is the bar. Below 95% means you're paying for garbage even on PPR.
  3. Drift detection cadence. "Updated weekly" means nothing. Ask whether the publisher runs synthetic tests on a schedule and whether they publish a status page or changelog.

Two Actors at the same nominal $0.005/record can differ by 3× on effective cost-per-usable-record once schema-validation rates and drift handling are factored in.

What does this look like for 50K marketing agency profiles?

Comparing four real-world ways to get the same dataset:

ApproachEffective costTime to resultRisk profile
Build it yourself (one engineer, ~2 weeks)$8,000-15,000 in eng time2-3 weeksHigh — you own the maintenance, drift, and proxies
Subscription scraper service ($499/month base + ~50K CU usage)$500-900 for one month~3 days to onboardMedium — vendor dependency, SLA usually 99%
PPR Actor at $0.005/record$250 (50,000 × $0.005)~30 minutes to resultLow — capped cost, schema-validated output, easy to compare alternatives
Custom enterprise contract$5,000-50,000 + legal review4-12 weeksLow for the data, high for procurement friction

For a one-shot pull of 50K agency profiles, PPR wins on every axis except for buyers who genuinely need procurement-grade contracts.

What's the buyer's checklist for any scraped-data purchase?

Before you run an Actor or sign a subscription, get clear answers on:

  • What's the price per validated record (not raw record)?
  • What schema does the output conform to? Is it documented?
  • What's the schema validation pass rate on a recent run?
  • How does the publisher detect upstream drift, and how fast do they fix it?
  • Is there a sample dataset you can preview before paying?
  • What output formats are supported (JSON, CSV, HTML preview)?
  • What's the publisher's track record on the source site (months of stable runs)?

Anything you can't answer is a risk you're carrying.

Frequently asked questions

What does PPE mean in scraping pricing?

PPE stands for "pay per event" — Apify's term for pay-per-result pricing. The "event" is typically an output record successfully validated and pushed to the dataset. PPE and pay-per-result are used interchangeably.

Are compute units (CUs) being phased out?

Apify has positioned PPE as the modern default since 2024, but compute-unit pricing still exists for buyers who run their own custom Actors or want compute-level flexibility. Most published Actors on Apify Store now offer PPE.

Does pay-per-result protect me from upstream site changes?

Partially. If the publisher has a strict schema-validation gate, malformed records caused by upstream UI drift are dropped before billing — you don't pay for them. If the publisher is sloppy about validation, you might pay for partially-broken records.

Why is delta-mode pricing higher than full-mode?

Delta mode does change-detection work on every run, even on records it doesn't return. The slightly higher per-record price reflects that overhead. Net cost per refresh is still typically lower because you only extract net-new records.

How do I budget for a one-shot dataset pull?

Multiply expected record count × per-record price × 1.05 (small buffer for re-runs). For 50K agency profiles at $0.005, that's $250 + small buffer = budget $275. Set a maxItems cap as a hard ceiling.

Can I get a custom price for high-volume use?

Most Apify publishers will negotiate volume discounts above 100K records / month. Reach out to the publisher directly via the Actor's contact link or via the request form on Directory Datasets.


Directory Datasets uses pay-per-result pricing on every Actor, with schema validation gating every record before it bills you. Agency Vista is $0.005 per validated agency profile. OnlineJobs.ph is $0.003 per validated job posting, with a $0.005 delta-mode rate when you only want net-new postings since the last run.