← All projects
PythonBeautifulSoupPandasOllamaopenpyxlLLM annotation

Court-Decision Extraction & AI Tagging

Pulled 3,000 Dutch court decisions from the official database and used a local AI model to label each section automatically.

Role
Python Engineer (Upwork)
Timeline
2024
Status
Delivered — legal research client

The Problem

A legal research client needed to extract and structure Dutch court decisions from Rechtspraak.nl — the Netherlands' official judicial database. Each decision is an HTML document with a unique ECLI identifier. The client had a list of thousands of ECLIs and needed the full text extracted, merged into Excel, and annotated with section labels (Background, Reasoning, Decision) to feed downstream analysis.

What I Built

Phase 1 — Scraper (rechtspraak_scraper.py): Fetched each decision by ECLI from the Rechtspraak API endpoint, parsed the HTML with BeautifulSoup, extracted structured fields (ECLI, date, court, title, full text), and saved to CSV.

Phase 2 — Merge & export (merge_xlsx.py): Consolidated multiple CSV batches into a single Excel workbook, handling encoding edge cases in Dutch legal text.

Phase 3 — LLM section parser (ollama_section_parser.py): An annotation layer that uses Llama 3 8B running locally via Ollama to identify and label the major structural sections in each decision (Background, Reasoning, Decision). The parser accepts either a local HTML file or a live URL, calls the Ollama REST API with a structured prompt, and returns a JSON of labelled sections. Running the model locally keeps annotation costs at zero and avoids sending court decision content to external APIs.

Technical Highlights

Outcome

Delivered: merged_Rechtspraak_nl.xlsx (~3,000 annotated decisions) + the annotator script. Client used it to build a training dataset for a downstream NLP classifier.

Court-Decision Extraction & AI Tagging — Christos Prapas — Christos Prapas