A regional general contractor was using AI to cross-check structural drawings against the material order list. Their pipeline was fast, automated, and silently catastrophic — until a senior engineer noticed something during the third week of pour.
Structural drawings include reinforcement schedules — hand-annotated tables specifying bar size, count, spacing, and tier. Field engineers write in shorthand: T12C3 means tier 1, bar 2, in column 3. The 'T' is often handwritten on top of a printed grid. Every OCR pipeline they tried — three commercial vendors, two open-source models — confused T12C3 with 712C3. Once. Twice. Then 80 times across a 200-drawing set.
The bar-list crosscheck flagged "mismatch" between the engineer drawing and the procurement schedule. Procurement assumed engineering was wrong; engineering assumed procurement entered the spec wrong; the field assumed both. Reconciliation meetings burned 40 hours a week. The pipeline was producing confidently incorrect data — the worst possible failure mode.
“We had AI doing 800 pages a week and producing junk we couldn't see. The pipeline was fast. The pipeline was wrong. Fast wrong is the most expensive wrong.”— VP Engineering, regional GC
They kept the OCR pipeline for speed. On every cell where the OCR's confidence dropped below 0.92, they called verify_document() with the affected drawing region and a typed schema.
Each call ran Express tier. The drawings never left the engineer's workstation — the SDK fragmented and masked them client-side. A founder verified each cell in under 30 minutes during business hours. Reads jumped from 89% accuracy to 99.7%.
from awaithumans import AwaitHumans client = AwaitHumans(api_key="ah_sk_live_...") for cell in ocr_results: if cell.confidence < 0.92: verified = await client.verify_document( task_description="Verify bar spec for this cell", response_schema=BarSpec, document_path=cell.drawing_path, prior_extraction=cell.guess, priority="high", ) cell.value = verified