The Trustworthy Documents Programme

PREFACE

Preface

The Programme asks what it would take for an artificial system to read a document of consequence at the standard such a document deserves.

A document of record — a contract, a regulatory filing, a financial statement, an audit report, a clinical note — is not a stretch of language to be summarised. It is evidence. Its sentences are qualified by clauses that hold them in place, its numbers are tied to definitions, its claims are held accountable by signatures and dates. Auditors, lawyers, regulators, and clinicians have spent a century building the apparatus by which such documents are read against. The apparatus exists because the cost of misreading them is not measured in fluency.

The Programme is concerned with what current document AI quietly elides: the difference between summarisation, which produces a plausible précis, and reading, which produces a comprehension that can be defended page‑by‑page when challenged. Contemporary systems are extraordinary at the former and uneven at the latter, and the gap between the two becomes visible only at the moment when an auditor, a regulator, or an opposing counsel begins to ask where the answer came from.

The Programme's commitment is that comprehension of high‑stakes documents must be trustworthy by construction: every assertion the system makes must be reachable to the page, the clause, and the figure from which it derives; every uncertainty must be qualified in language a domain expert recognises; every output must be auditable on the same grounds as a human reader's would be.

The work is conducted with operating institutions in finance, law, regulated industry, and supply chain — where the question of whether a document was read correctly is asked by people whose answers are then themselves held to account.

EVIDENCE

Horizon

On Evidential
Trust.

Trustworthy reading.

The Programme takes a position.

Most document AI is evaluated against summaries — fluent, helpful, plausible. The institutions that depend on documents are not evaluated against summaries. They are evaluated against evidence: whether their reading of a clause will survive opposing counsel, whether their classification of a transaction will survive an audit, whether their extraction of a number will reconcile to a source. The asymmetry between how document AI is evaluated and how its users are evaluated is the failure mode the Programme exists to address.

Evidential Trust is the property a system has when its comprehension can be examined on the same terms as a human reader's: traceable to the page, reproducible across runs, qualified where the document itself is qualified, and silent where the document is silent. It is not the same as accuracy. A system can be accurate on a benchmark and untrustworthy in deployment; a system can be modest on a benchmark and earn the trust of an auditor over time.

The Programme refuses the trade in which fluent summarisation stands in for evidential reading. It treats the page as the unit of accountability and engineers backward from there.

METHOD

Method

Four interleaved fronts — each oriented to the standard of an audit, a court, or a regulator, rather than the standard of a benchmark leaderboard.

Evidence preservation.

Every assertion produced by the system is grounded to its evidentiary source — the span on the page, the row in the table, the clause in the contract. Hallucination is treated not as a probability to be reduced but as a property to be eliminated by construction: outputs that cannot be traced are not produced. The unit of reading is the page, not the document; the unit of evidence is the span, not the paragraph.

Hierarchical comprehension.

Documents of record have structure that summarisation flattens. The Programme treats reading as a layered task: parse and preserve layout and tabular structure; identify clause, definition, and reference relations; compose page‑level claims that respect cross‑references; and only then produce document‑level outputs. Each layer is held to its own evaluation, so failure modes are diagnosable rather than blended.

Provenance and accountability.

Trustworthy reading requires that the system's reading be itself a record. The Programme develops provenance frameworks — including verifiable extraction pipelines — under which every output is bound to the model that produced it, the document version it read, and the chain of reasoning by which it arrived at its claim. This makes systems audit‑ready by construction rather than by retrofit.

Field validation.

Benchmarks misjudge document AI because their evaluators are not the people who use it. The Programme partners with finance, legal, regulated‑industry, and supply‑chain operators, where the standard of acceptance is set by the institutions that bear the consequence. Evaluation suites are developed in dialogue with these partners and released to the community in stages as the field matures.

WORKS

Now in progress

The Programme’s current anchor: an auditable‑by‑construction approach to document extraction, paired with evaluation instruments developed alongside partner deployments in finance, legal, and regulated industry.

Anchor

Document AI that can’t quietly invent the number.

The Programme’s anchor: document-reading systems in which an auditor can follow every number back to its source — not to a model’s after‑the‑fact rationale.

Active stream.

Evaluation

Measuring document AI the way an auditor would.

An evaluation framework that extends beyond character‑level accuracy to layout, tabular structure, and the integrity of cross‑referenced fields — developed with finance, legal, and regulated‑industry partners who set the standard of acceptance.

Partner‑facing; staged community release.

Evaluation

Financial document AI, scored against the disclosures themselves.

An evaluation suite that reconciles extracted figures to their source disclosures, and ablates prompt strategies under realistic deployment constraints — built where the consequences of a wrong number are not academic.

Active stream.

Operating record

Years of audit‑defensible document AI in production.

The Programme inherits the operating depth of the Director’s portfolio companies (Cinnamon AI · Nexus FrontierTech), where audit‑defensible extraction has been the standard of acceptance across years of enterprise deployment in finance, insurance, and regulated industry.

Operating record · active partner case studies.

The other three inquiries

Ⅰ The Cognition‑aware AI Programme Human Intent over Algorithm. Ⅲ The Deliberative Agents Programme Tension as method. Ⅳ The Collective Cognition Programme Cognition lives between people.

← Return to the Four Inquiries

CORRESPOND

Correspondence

Inquiries are
welcome.

For research collaborations, joint publications, advisory engagements, and field deployments —

A note via the form below — the Director responds personally.

The TrustworthyDocuments Programme.

On EvidentialTrust.