Pricing model
How parcelpump charges for the data + infrastructure it operates.
Captured 2026-05-05 as the canonical replacement for
docs/pre-publish-roadmap.md §4 ("Financial model"), which proposed
classic SaaS tiers and is now superseded.
Mental model
parcelpump is public-good infrastructure with proprietary operations. The code and adapters are not open-sourced. The cost ledger, architecture, API surface, refresh schedules, and catalog of what's wired are open. Customers see exactly what we spend and what we charge on top.
Three principles drive everything below:
- Cost-plus, not value-based. We don't price on what the data is worth to the buyer. We price on what it costs us to produce, plus a published markup. The markup is the same for everyone.
- The live API is for tiles + per-parcel reads. Bulk needs a different channel. Live-API hammering exhausts Lambda concurrency and inflates RDS load. Anyone who needs sweep access goes through bulk export, which is cheap on our side and priced accordingly.
- Counties are stakeholders, not targets. parcelpump runs as a visible, identifiable scraper with a public contact path. If a county sees runaway scraping in their logs, we want them to be able to find us, ask us about it, and (when it's not us) get us to absorb that traffic into our infrastructure so their portals stop getting hammered.
What we sell — and what we keep internal
A hard product boundary: customers see straight, normalized data from the source counties. They do not see parcelpump's internal analytical layers.
Customer-facing (sold via API + bulk export)
- Parcel polygons + canonical attributes (situs, mailing, zoning,
acreage, valuation history, sales, owner names) — pulled from the
county's own assessor / treasurer portal, normalized to the
canonical
Scrapetype. - Parcel scrape data exactly as it came from the source, with consistent field names across vendors. The normalization is the service.
Internal-only (never returned via API or export)
- CSB-derived agricultural flags (
is_agricultural,dominant_crop,ag_year_count). These exist in theparcelstable to target which parcels we scrape (cuts the per-parcel scrape budget by ~93% for ag-focused workflows). They do not ship. - The
csb_fieldstable (USDA Crop Sequence Boundaries polygons). Internal infrastructure for the ag-targeting join. - The
findingstable (review-engine output: assessor differential analysis, anomaly flags, comparative valuations). This is parcelpump's analytical layer — proprietary work product. Not exposed to API customers. - Ownership-graph match decisions between SoS entities and parcels (when that ships). The matched fact ("this parcel's owner_name contains this LLC name") is fine to surface; the inferred match weight / clustering logic stays internal.
Why the boundary matters
- Aligned with the cost-plus framing. Customers pay for the straight-through scrape pipeline. The analytical layers are value-add we may eventually monetize separately, but they are not part of the "data utility" product.
- Provenance clarity. Everything we ship has a clear county source. No customer can confuse parcelpump's internal classification with the county's own record.
- Future product surface. If/when we sell tax-appeal evidence, ag market intelligence, or ownership-graph queries, those are separate product lines built on the internal layers — not bundled into the data utility.
API + export consequences
GET /parcels/:source/:idreturns canonical attributes +scrape_dataonly.is_agricultural,dominant_crop,ag_year_countcolumns are not serialized.GET /findings/:source/:idbecomes admin-only. Currently open to any valid key insrc/api/server.ts; needs locking down as part of the build.- Bulk export schemas: same redactions. Exported parquet schemas exclude all internal-only columns.
POST /scrape-jobs/enqueue-ag-countycontinues to work — the customer requests "enqueue ag parcels in this county" without ever seeing the underlying flag. The classification is a server-side filter, not a returned field.- Search responses: ag-flag fields not surfaced; ag-keyed query
shaping (e.g.
?agricultural=true) is not exposed.
The 35% markup
All prices = published AWS+vendor cost × 1.35.
The 35% is calibrated to cover, in rough proportion:
- ~20% engineering oncall, adapter maintenance, on-call response
- ~10% reserve for adapter rebuilds when county portals change
- ~5% legitimate margin
This number is published verbatim at parcelpump.dev/about/cost
alongside the live AWS bill breakdown. If our true cost moves (RDS
size up, ScraperAPI rate change), the underlying numbers update; the
35% does not.
Three product surfaces
| Surface | Path | Access | Backed by | Pricing primitive |
|---|---|---|---|---|
| Live API | api.parcelpump.io |
per-key auth, rate-limited | RDS hot path | per-request, cost-plus |
| Bulk export | signed S3 URLs | per-account subscriptions or one-shots | scheduled snapshot pipeline | per-snapshot, cost-plus |
| Scrape funding | parcelpump.dev/data |
logged-in users | scrape worker fleet + adapters | wire / refresh / one-shot, three eng tiers |
1. Live API
Per-request billing. Free tier covers casual use; paid plans cover production reads. Concretely:
- Free key: 10K requests/month, hard rate-limit 1 RPS sustained / 10 burst.
- Paid: pre-paid balance against per-request cost. A typical read costs us a fraction of a cent in Lambda + RDS + CloudFront; the customer is charged that × 1.35. Balance hits zero → calls return 402.
- No tiered subscription. Pay for what you use; no monthly minimum.
Tile bytes (api.parcelpump.io/tiles/...) stay free and uncapped —
they're CloudFront-cached and trivially cheap on our side. The whole
point of the tile layer is broad embedding.
2. Bulk export
Pre-built or on-demand snapshots delivered as signed S3 URLs. Customers download via curl/aws-cli; URL expires in 7 days.
- Format: GeoParquet by default. GeoPackage / CSV+WKT / Shapefile available with a small per-format premium (extra encode time).
- Scope: by county, by state, or by filter (e.g. "assessed value > $1M in TX", "all residential land-use in Cook County"). Filter expressions use canonical scrape fields only — internal-only columns (CSB ag flags, findings, match weights) are not filterable by customers. Filter exports are quoted.
- Cadence: one-shot ("most recent snapshot, $X") or subscription ("weekly OK statewide, $Y/mo").
- Snapshot reuse: if multiple customers subscribe to the same (scope, cadence) pair, we generate the snapshot once and charge each at marginal serving cost (S3 GET + egress + 35%). Initial generation cost amortizes across subscribers.
- Watermarking: every snapshot embeds
account_id+generated_atin metadata. If a snapshot leaks to a non-customer, we can identify the source. - Freshness disclosure: every catalog entry shows
last_refresh_at+next_scheduled_at. If you buy an export of a county we haven't re-scraped in 6 months, you're getting 6-month-old data. Visible upfront.
3. Scrape funding
The funding mechanism for adding new data. Customer wants Cook County IL → they fund the wiring + initial scrape. After that, they (or anyone else) can fund a refresh subscription.
Three engineering tiers for adapter wiring, published as a fixed catalog:
| Tier | What it covers | Markup base |
|---|---|---|
| Bronze | Existing vendor, new county (e.g., another Tyler PACS county) | half-day eng + initial scrape |
| Silver | New vendor pattern (a vendor we haven't wired yet) | 2-3 days eng + initial scrape |
| Gold | Auth / captcha / human-in-the-loop required (e.g., Tyler EagleWeb counties needing registered-user creds) | 1-2 weeks eng + initial scrape + ongoing creds maintenance |
Refund-or-recompute clause: if a Bronze quote turns out to need Silver/Gold work mid-build, we either refund the funder and stop, or re-quote and ask for incremental funding.
Cadence subscriptions for funded counties:
- Customer subscribes to "(source, cadence)" pair: Cook IL daily, weekly, monthly, etc.
- Cost = (Lambda invokes + ScraperAPI proxy + RDS writes for that cadence) × 1.35.
- No exclusivity window. Once a county is wired, the data is available to all keys at the live-API rate. The funder's leverage is they set the refresh cadence, not they own the data. If another customer wants higher cadence, they pay the marginal delta.
This aligns with the public-utility framing: pay for the work, not for the right to exclude others.
Anti-API-scrape posture
The live API is fragile under sweep traffic. Three layers:
- TOS clause at
parcelpump.dev/terms: explicit "no using the live API for bulk data extraction. Use the bulk export channel." - Rate limits: per-key hard cap (free 1 RPS, paid scaled with plan). WAF on CloudFront for IP-level abuse.
- Pattern detection: a heuristic flags accounts pulling many distinct parcel IDs with low spatial locality in a short window. Soft-throttle + a UI message: "this looks like a bulk extraction pattern. We offer this as an export at /dashboard/exports — switch over and your costs drop ~100x." Carrot, not just stick.
County-trust posture
Counties are not targets and not adversaries. parcelpump's posture:
- Identifiable User-Agent on every scrape:
parcelpump/1.0 (+https://parcelpump.dev/for-counties; ops@parcelpump.dev). A county seeing this in their logs knows immediately who we are and how to reach us. /for-countiespage explains who we are, what we scrape, our refresh cadences, our rate-limit philosophy (we throttle ourselves to portal-friendly rates), and a contact form for issues.- Future: a county-officials registry (gated by
.govemail verification) where counties can opt into rate-limit honoring, register a primary contact, and report unwanted third-party scrapers we should help absorb. Deferred until first concrete county engagement; not worth building before then. - Possible future product: paid tier for counties to redirect third-party scraping traffic to us. They send wild scrapers our way; we add their portal to our catalog at zero cost to them; the county's portal infra stops getting beaten up. Speculative, but the architecture supports it.
Schema implications
New tables needed before any of the pricing above is real:
accounts— billing principal: id, contact, attribution string, credit balance, plan, stripe_customer_id, created_atapi_keys— gainsaccount_id(multi-key per account); existingcapabilitiesarray staysscrape_funding— funding ledger: account_id, source, kind (wire/refresh/one-shot), amount_cents, aws_cost_cents, proxy_cost_cents, eng_cost_cents, eng_tier, created_atexports— bulk export subscriptions: account_id, scope, format, cadence, last_generated_at, last_s3_key, last_size_bytes, last_cost_cents, statususage_log— per-request: api_key_id, endpoint, status_code, ms, response_bytes, cost_cents, occurred_atsources— gainsfunded_by_account_id(nullable),wired_at,wiring_cost_cents
Open decisions still pending
- Stripe wire-up: per the original roadmap, defer until 60 days of free usage data. Build the schema + UI to be Stripe-ready; flip the switch later.
- Free-tier abuse vector: 10K requests/mo free is generous. If we see abuse (e.g., one user creating many accounts), tighten to 1K/mo and require credit card on file.
.govemail verification for the county registry: which provider? Probably custom — match a regex + send a verification email. Defer until first county wants in.- Bulk export pre-build vs. on-demand thresholds: when does an on-demand request become a pre-built subscription? Probably "if three customers ever ask for the same scope." Codify later.
- Parcels-as-public-records license clarity: we should publish a "data license" page making explicit that individual parcel records are public records (county provenance) and customers can use them freely; the prohibition is on bulk republication of our compiled dataset. TOS lawyer review territory.
Supersedes
docs/pre-publish-roadmap.md§4 (four-tier SaaS pricing model). Marked superseded in that doc; kept for historical context.
Updated as
- 2026-05-05 — initial capture from chat with WM. Cost-plus + 35% + three surfaces + county-trust posture.