AGENTICTRUTH
open takeoff-consensus-engine
← Type "back" or click [back-to-logbook] to return
evidence/takeoff-consensus-engine
EXHIBIT BSHIPPED

TakeOff Pro: Multi-Model Consensus

95% auto-approved, quote turnaround 48h→4h

TypeScriptRedisPostgreSQLOpenAIAnthropic
FINDINGProblem Statement
Construction takeoff process required expert review of every LLM-generated quote. Client needed automation without sacrificing accuracy.
CONSTRAINTSBoundaries
Zero billing errors tolerance (insurance requirement)
Must support 3+ model providers for redundancy
Confidence thresholds tunable per client risk profile
Full audit trail for every decision
Must handle 200+ line item quotes
APPROACHMethod
Implemented "Automated Consensus" pattern: 3+ models vote on each line item, flag disagreements for human review. Added confidence scoring and audit trail for every decision. Domain experts validate strategy, not every output.
RESULTSMeasured outcomes
95%
Auto-Approval Rate
5% flagged for human review
48h → 4h
Quote Turnaround
92% faster end-to-end
0
Billing Errors
90-day production trial
100%
Audit Compliance
Insurance requirements satisfied
runbook-excerpt.log
consensus-engine run --input hemlock-hill-phase2.pdf
[14:22:10] INFO Processing takeoff: hemlock-hill-phase2.pdf
[14:22:45] INFO Consensus reached: 58/61 items (95% agreement)
[14:22:45] WARN Flagged for review: 3 items (drywall variance >10%)
[14:22:46] INFO Quote generated: $847,250 (confidence: 0.94)
[ OK ] Quote ready. 3 items pending human review.
LESSONSWhat we learned
01Multi-model voting is more reliable than a single "best" model.
02Confidence thresholds must be tunable per client risk tolerance.
03Audit logs are a feature, not overhead.
04Domain experts validate strategy, not every output.
~

TakeOff Pro: Multi-Model Consensus

Status: SHIPPED

Outcome: 95% auto-approved, quote turnaround 48h→4h

Problem

Construction takeoff process required expert review of every LLM-generated quote. Client needed automation without sacrificing accuracy.

Constraints

  • Zero billing errors tolerance (insurance requirement)
  • Must support 3+ model providers for redundancy
  • Confidence thresholds tunable per client risk profile
  • Full audit trail for every decision
  • Must handle 200+ line item quotes

Approach

Implemented "Automated Consensus" pattern: 3+ models vote on each line item, flag disagreements for human review. Added confidence scoring and audit trail for every decision. Domain experts validate strategy, not every output.

Results

  • Auto-Approval Rate: 95%5% flagged for human review
  • Quote Turnaround: 48h → 4h92% faster end-to-end
  • Billing Errors: 090-day production trial
  • Audit Compliance: 100%Insurance requirements satisfied

Lessons Learned

  • Multi-model voting is more reliable than a single "best" model.
  • Confidence thresholds must be tunable per client risk tolerance.
  • Audit logs are a feature, not overhead.
  • Domain experts validate strategy, not every output.

Technology Stack

TypeScript, Redis, PostgreSQL, OpenAI, Anthropic