Xtcworld

Rules vs. Large Language Models: A Hands‑On Comparison for B2B Document Extraction

A practical hands‑on comparison of rule‑based PDF extraction (pytesseract) vs. an LLM approach (Ollama + LLaMA 3) for B2B order documents, with accuracy, speed, and maintainability trade‑offs.

Xtcworld · 2026-05-17 08:09:21 · Reviews & Comparisons

Introduction

Extracting structured data from B2B documents—such as purchase orders, invoices, and delivery notes—has long been a challenge for automation teams. Traditional rule‑based systems rely on optical character recognition (OCR) and hand‑crafted patterns, while modern large language models (LLMs) promise flexibility and contextual understanding. In this article we compare two implementations of the same B2B document extractor: one built with pytesseract and a set of rules, and another using Ollama and LLaMA 3. The goal is to highlight the strengths and weaknesses of each approach in a realistic order‑processing scenario.

Rules vs. Large Language Models: A Hands‑On Comparison for B2B Document Extraction
Source: towardsdatascience.com

The B2B Order Scenario

Imagine a company that receives hundreds of PDF purchase orders daily. Each order contains fields such as buyer name, order number, item list with quantities, prices, and a total amount. The variation in layout, font, and language makes extraction error‑prone. A reliable extractor must handle these variations while maintaining high accuracy and low latency.

Rule‑Based Approach with Pytesseract

How It Works

The rule‑based system starts by converting PDF pages to images using pytesseract (Python’s wrapper for Tesseract OCR). After OCR, the system uses regular expressions and positional heuristics to locate and extract the required fields. For example, a rule might look for the pattern Order\s*#\s*:?\s*([A-Z0-9]+) to capture an order number.

Strengths

  • Deterministic output: Rules are transparent and predictable; you know exactly why a field is extracted or missed.
  • Low inference cost: No GPU or cloud API calls are needed—only CPU and local memory.
  • Fast latency: OCR + regex processing typically completes in under two seconds per page.

Weaknesses

  • Fragile to layout changes: A different font, column alignment, or table structure can break the rules.
  • High maintenance burden: Each new customer template requires writing new rules and testing against sample files.
  • Poor handling of ambiguous text: Misspelled words or low‑quality scans often produce garbage output.

LLM‑Based Approach with Ollama and LLaMA 3

How It Works

For the LLM approach, we use Ollama to run a local instance of LLaMA 3 (8B parameters). The PDF is first converted to plain text using a simple OCR pass or by extracting native text. Then a carefully crafted prompt instructs the model to parse the order and return JSON with the required fields. The prompt includes example outputs and an explanation of the desired schema.

Strengths

  • Robust to layout variation: The model understands context—it can identify the order number even if it appears in an unusual position.
  • Handles typos and non‑standard formatting: LLaMA 3 can infer the intended value from surrounding text.
  • Easier to scale to many templates: No need to write rules for each new customer; a single prompt works for diverse layouts.

Weaknesses

  • Higher latency: Generating a response can take 5–15 seconds per page, even on a modern GPU.
  • Inconsistent outputs: The model may hallucinate fields or produce slightly varied JSON structures across runs.
  • Higher resource cost: Requires a GPU with at least 8 GB VRAM when running locally, or API costs if cloud‑based.

Head‑to‑Head Comparison

Accuracy on a Test Set

We evaluated both systems on 50 real purchase orders from five different suppliers. The rule‑based system achieved 82% field‑level accuracy, mainly failing on tables with merged cells and on fields that appeared in unexpected positions. The LLM system achieved 94% accuracy, correctly extracting nearly all fields but occasionally misreading total amounts or skipping rare fields.

Rules vs. Large Language Models: A Hands‑On Comparison for B2B Document Extraction
Source: towardsdatascience.com

Speed and Cost

Processing time for the rule‑based system averaged 1.2 seconds per document, using only a single CPU core. The LLM system took an average of 8.7 seconds per document on an NVIDIA RTX 3060 GPU. For a batch of 1,000 documents, the rule system would finish in 20 minutes, while the LLM would require over two hours—though this can be parallelized with multiple GPUs.

Maintainability

When a sixth supplier with a completely different layout was added, the rule‑based system required two days of work to create and test new patterns. The LLM system needed only five prompt adjustments and re‑evaluation, which took two hours.

When to Choose Which

  • Rule‑based: Best for high‑volume, homogeneous documents where layouts are stable and latency is critical. Also ideal when you need completely deterministic behaviour (e.g., for audit trails).
  • LLM‑based: Better for low‑volume, highly variable documents, or when you lack the resources to maintain many rules. The LLM can be a “swiss army knife” that handles new templates quickly.

Conclusion

Neither approach is universally superior. The rule‑based extractor with pytesseract is fast, cheap, and predictable, but brittle. The LLM‑based system using Ollama and LLaMA 3 is flexible, accurate, and low‑maintenance, but slower and more expensive. For many B2B scenarios, a hybrid approach may be optimal: use rules for common templates and fall back to an LLM when confidence is low. Whichever path you choose, the key is to thoroughly test on your own data and monitor performance over time.

This article originally appeared on Towards Data Science.

Recommended