← All projects
PythonPlaywright2Captcha / CapSolverCloudflare bypassBeacon APIasyncio

US Property Records Data Collection

Pulled 16,402 property parcels from a US county's public records system at a 97.3% success rate.

Role
Python Engineer (Upwork)
Timeline
2024
Status
Delivered — US property data client

The Problem

Acquiring large-scale geospatial datasets often means working with non-public APIs that are protected by anti-bot layers. QPUBLIC serves US county property assessment data through a Beacon API that requires: Cloudflare challenge resolution, CAPTCHA-solved session tokens, and grid-based spatial query design to paginate through a county's entire geographic extent. The engineering challenge was not just bypassing the gates — it was designing a pipeline that reliably acquires 16,000+ parcel coordinate records per run at scale.

What I Built

A multi-layer geospatial ingestion pipeline with four distinct components:

1. Cloudflare bypass

Playwright with stealth mode handles the JS challenge that Cloudflare presents on first visit. The resulting session cookies are harvested and reused for API calls.

2. CAPTCHA solving

QPUBLIC uses reCAPTCHA v2/v3 to gate the Beacon API. The pipeline integrates both 2Captcha and CapSolver (switchable via config) — the CAPTCHA image/audio is sent to the solving service, the token returned, and injected into the form before the API call proceeds.

3. Beacon API client (BeaconClient)

Once authenticated, the Beacon API accepts spatial queries: given an AppID, LayerID, and a bounding box (Extent), it returns property records within that extent as GeoJSON-style features with parcel IDs and coordinates.

The client paginates through the county's bounding box in a grid of overlapping extents to ensure no parcels are missed at the edges.

4. Coordinate extraction and export

Raw API responses are parsed, deduplicated by parcel ID, and exported to CSV/JSON with standardised lat/lng columns.

Technical Highlights

Outcome

Delivered for multiple county datasets. Chatham County GA run: 16,402 parcel coordinates extracted, 97.3% success rate, 18 CAPTCHA solves, 4 min 12 sec runtime. Geospatial output feeds into clients' property analytics and spatial ML workflows.

US Property Records Data Collection — Christos Prapas — Christos Prapas