What an AI council should ask every vendor

AI councils at regulated enterprises are standing bodies now. They meet on a cadence, and every AI vendor purchase routes through them before it closes. The rubric they run matters more than the demo they sit through.

Most councils inherited their rubric from software procurement: certifications, data-handling policies, integration timelines, vendor history. Those questions filter for some things. They do not filter for the control architecture that makes an AI system safe to run in a regulated environment for three or five years. A vendor who passes the standard checklist may still be impossible to audit in year two.

The six questions below surface architecture. Each has a short form and a longer one. The short form is what to ask in the room. The longer form is what to listen for.

Where does our data live, and who can see it?

A policy document says where data is supposed to live. An architecture diagram shows where it flows.

The right answer for a regulated enterprise: single-tenant, VPC-resident, no egress to a shared environment. The vendor's support team does not read your applicant data. Nobody outside your cloud boundary does. The only thing the vendor sees is an up/down status page gated by a credential you control.

Most vendors answer with a policy statement: data is encrypted in transit and at rest, on SOC 2 compliant infrastructure. That sentence describes the vendor's control environment during an audit window. It does not describe where your data goes or who touches it between audits.

The follow-up that clarifies: what does the vendor see on a Tuesday morning when no ticket is open? If the answer is anything beyond a heartbeat signal, the architecture is multi-tenant. That is the real answer to the data question, and the policy document never gives it.

Show me a decision the system made and why.

This is the question that cleanly separates vendors.

Ask to see a specific recommendation from the staging environment: what data the model weighed, what it proposed, what a human did with the proposal, and when. A system built for governance answers in minutes. A system built for demos needs a technical call to schedule.

A Decision Trace is the artifact that makes this possible: a signed record of a single decision, capturing the model's inputs, the human's response, and the timestamp on each step. The trace ships with the decision, not after the audit request. It answers an auditor's questions in the order an auditor asks them.

The question beneath this one: was the system designed so a trace could be queried on demand, or does the vendor's team produce it for you after you ask? Those are different architectures. One was designed for inspection from day one. The other was designed for a demo environment and will need a follow-up call to explain what the production system does.

The governance post that covers the trace in detail, from a buyer already in diligence, is What control model lets you move that fast?

Who approves a regulated action, and what happens to the record?

Every AI vendor says their system puts humans in the loop. Ask what that means for a workflow that touches a regulated category.

On a decision subject to adverse-impact monitoring, or a compensation change with compliance exposure, one approval should not be enough. Two humans, two signatures, before the workflow executes in any downstream system. Banks have run payments on dual authorization for decades. AI systems making decisions in regulated domains should run the same control.

The second part of the question is the record. A system that logs only approvals produces a highlight reel. The complete record includes what a reviewer changed before signing, and what they declined. A year from now, an auditor needs to pull any decision and see what the system proposed, what changed, who signed, and what anyone refused. An approval log where nobody ever declines anything would itself be a finding in any audit conversation I have been in.

Ask the vendor to show you a declined recommendation from their production environment. If they can produce one, the approval gate is load-bearing. If they cannot, nobody is reading the proposals.

What does the model do when it hits something it doesn't recognize?

This is the integration confidence question. It shows up most clearly in year two, after the deployment is live and the environment has changed in ways nobody tracked carefully.

Enterprise systems are living things. Field names drift. A new HCM version renames a field. An integration that held clean at deployment starts producing odd output three months in because a source field changed and the pipeline never flagged it. A model that silently maps a drifted field to the nearest neighbor, with higher confidence than the evidence supports, is a liability that passes every pre-deployment test and fails in production.

The right behavior is to flag and wait, or to stop the affected flow entirely. Nothing uncertain maps silently. A renamed or drifted field stops the data from flowing rather than being guessed at. Ask the vendor what that mechanism looks like and who gets notified when it fires.

Ask specifically: what happens when a field your integration relies on gets renamed in our Workday instance? A good answer names the mechanism. Answers that reference continuous monitoring or adaptive learning as the solution are not answers to this question.

How long did your last legal review actually take?

Not a target. A number from a completed deployment.

Vendors publish integration timelines that are aspirational. They represent the best case with a cooperative procurement team and no unexpected questions from security or legal. The number the council needs is what the timeline looked like when a large enterprise ran a real review, the kind with actual questions, actual security teams, and actual concerns about data residency and adverse-impact exposure.

At a Fortune 500 insurance carrier that had spent eighteen months evaluating and rejecting six prior vendors on architecture: legal approval in 17 days. Contract to production in 34 days. Those numbers describe a completed deployment at a buyer who ran a serious review and kept the record. A sales target is a different thing.

Ask the vendor for the number from their last enterprise deployment. If they give you a range, ask which specific deployment it came from. If they cannot name one, the number is either aspirational or unavailable, and either matters to your council's decision. The precision of the answer is itself a data point.

If we end this relationship, what do we keep?

The exit question is the trust test.

A vendor who answers with export formats and data portability is talking about rows and files. The intelligence in a deployed AI system is not the data. It is the model that was calibrated against four years of your environment, your patterns, your outcome history. The question is who owns that model.

The right answer: the customer keeps it. Weights stay in the enterprise's cloud. If the relationship ends, the model does not leave with the vendor. The exit is clean because the architecture was built so the intelligence would always belong to the enterprise.

A vendor who says this without prompting has designed the product around ownership from the start. A vendor who hedges is designing around retention. You can tell which one you are talking to by how long they take to answer and how specific the answer is.

This question also tells you how the vendor thinks about year three. A vendor confident the product performs in production at year three does not need to make exit hard. That confidence is either in the value the system delivers or it is not.

What the answers reveal

A vendor who answers all six questions in plain language, with specifics, in one room session is running an architecture built for governance from day one. The answers exist because the product was designed to produce them, and the people presenting it have been answering these questions since the first regulated deployment.

A vendor who needs a follow-up call per question is running governance as a feature layer. The controls may exist in a security review document. Whether they exist in the production system, in a form you can query and inspect, is what those follow-up calls are trying to determine.

The council's job is to run the review as if year two has already arrived. A demo environment answers every governance question optimistically. An architecture review exposes what happens when the environment belongs to you, the data is yours, and the auditor is real.

Senior buyers at regulated enterprises know this difference. The council exists because vendors learned how to pass demo reviews, and the demos stopped being enough. The shift to asking about mechanisms, traces, and exit paths is the same shift financial services made with model risk management in the last decade.

What passes a serious council review and what performs in production for four years are not different things. The architecture that earns the first approval is the architecture the business runs on.

Saad Bin Shafiq is the founder of Nodes. Anchor pilot: Fortune 500 insurance carrier, four years of production data, 10,765 agents. Methodology: Decision Traces.

The six questions below surface architecture. Each has a short form and a longer one. The short form is what to ask in the room. The longer form is what to listen for.

Where does our data live, and who can see it?

A policy document says where data is supposed to live. An architecture diagram shows where it flows.

Show me a decision the system made and why.

This is the question that cleanly separates vendors.

The governance post that covers the trace in detail, from a buyer already in diligence, is What control model lets you move that fast?

Who approves a regulated action, and what happens to the record?

Every AI vendor says their system puts humans in the loop. Ask what that means for a workflow that touches a regulated category.

Ask the vendor to show you a declined recommendation from their production environment. If they can produce one, the approval gate is load-bearing. If they cannot, nobody is reading the proposals.

What does the model do when it hits something it doesn't recognize?

This is the integration confidence question. It shows up most clearly in year two, after the deployment is live and the environment has changed in ways nobody tracked carefully.

How long did your last legal review actually take?

Not a target. A number from a completed deployment.

If we end this relationship, what do we keep?

The exit question is the trust test.

What the answers reveal

What passes a serious council review and what performs in production for four years are not different things. The architecture that earns the first approval is the architecture the business runs on.

Saad Bin Shafiq is the founder of Nodes. Anchor pilot: Fortune 500 insurance carrier, four years of production data, 10,765 agents. Methodology: Decision Traces.