← All projects
PythonopenpyxlpandasExcel automationPDF parsing

Excel Work-Order Data Extraction

Turned a contractor's hand-filled Excel templates into a clean spreadsheet — 312 daily work orders processed in 22 seconds.

Role
Python Engineer (Upwork)
Timeline
2024
Status
Delivered — construction contractor

The Problem

Real-world data rarely arrives in a clean tabular format. This project is a good example: a construction contractor's proprietary Excel template used complex merged-cell geometry to encode daily work orders — nested tables for labour hours, equipment, materials, and sign-off rows within a single locked sheet. Standard pandas.read_excel produces garbage on merged cells. Extracting clean, ML-ready structured records from this kind of source requires navigating the file's internal geometry, not just reading it row by row.

The Challenge

The Excel format was not generated programmatically — it was hand-filled using a locked template with complex merged cell ranges. Standard pandas.read_excel produced garbage due to the merging. The extraction needed to:

  1. Identify the start and end rows of each ticket block (by LOG_ID marker cells).
  2. Navigate merged cell geometry with openpyxl.worksheet.cell_range.CellRange to find logical cell boundaries.
  3. Extract each field (date, crew, hours, materials, location, sign-off) by relative position within the block.
  4. Output a flat CSV/Excel with one row per ticket, preserving all embedded images (signature photos) as linked files.

What I Built

extract_daily_tickets.py — a command-line tool that:

Outcome

Processed 12 months of daily tickets (300+ work orders) in under 30 seconds. The output fed directly into the client's payroll spreadsheet without manual re-entry.

Excel Work-Order Data Extraction — Christos Prapas — Christos Prapas