Build vs Buy Enterprise AI: The Missing Decision Layer

Technology leaders at large financial institutions keep handing me the build-versus-buy objection in its crispest form: we already built 85% of this in-house.

That sentence has come up in our enterprise conversations for years, and most of it is true. The worst response a vendor can give is to pretend the build does not exist or does not work. It exists. It works. The team that shipped it knows the company's data better than any outsider ever will, and the leader who funded it has already absorbed the organizational pain a vendor would charge for.

So here is the straight answer. The 85% is real. The missing 15% is not a feature gap. It is a different architecture.

The 85% is real

The build, when we see it, usually looks like this. Pipelines out of the ATS, the HRIS, the CRM, and the rest of the ten to fifteen systems a Fortune 500 runs, landed in a lakehouse. Entity resolution, so a person is the same person everywhere. An embedding index over documents and transcripts. A model gateway with SSO, role-based access, and logging that survived a security review. On top of it all, a surface where an analyst can ask a question that spans systems and get a sourced answer in seconds.

That is a working retrieval system over joined data, running inside the company's own walls, and it is a real achievement. The joins alone are months of unglamorous work that most vendors never attempt: reconciling ten schemas, resolving duplicate identities, untangling access regimes that were never meant to coexist. When a technology team says it already built 85% of what an intelligence-layer vendor is pitching, this is what it means. And on the surface, a demo of that system looks a lot like a demo of ours. You type a question. An answer comes back with citations.

The resemblance ends at the question mark.

What the missing 15% is

Three things, each of which sounds like a feature until you try to build it.

The proactive loop. Our agents reason over the joined data continuously instead of waiting to be asked. They watch baselines and thresholds across every system at once and decide on their own when something deserves a human's attention. Nobody queried the system into noticing that ramp is drifting on a cohort, or that an open requisition's fill probability has sunk below its historical baseline. It noticed first.

Drafted cross-system workflows with ROI attached. When the loop finds something, it drafts the full response rather than filing an alert: the actions in each underlying system, their sequence, and two numbers, the cost of acting and the cost of doing nothing. A human approves, edits, or declines. On approval, the system executes across those systems itself.

A Decision Trace on every action. Every decision carries a queryable record: what happened, where, why, what the reasoning was, and what input any human gave. The methodology is published in Decision Traces, built on four years of production data covering 10,765 agents at a Fortune 500 insurance carrier. A trace is reasoning captured at decision time, not a log reconstructed after the fact. The distinction decides whether an auditor gets an answer or a shrug.

Why you cannot bolt it on

Retrieval answers questions. The loop proposes work. Every hard difference between the two systems follows from that one, and none of them attaches to a question-answering system as an add-on.

Start with state. A retrieval query is stateless: index in, answer out, forget. A proactive loop keeps durable state on everything it watches. Every baseline, every threshold crossing, every item it has already surfaced, and what the human decided about each one. The loop even has to remember what it chose to stay silent about, so the misses can teach it. The in-house build carries no such state because retrieval never needed it, and retrofitting it means redesigning the data model the whole system stands on.

Then initiative. A question-answering system never has to decide when to speak. A proactive system decides it constantly, and the decision is the hardest problem in the design: out of everything that changed across ten to fifteen systems this week, what deserves one of the few slots a human will read today? Tune it loose and the feed is noise nobody opens. Tune it tight and the system sits silent through the quarter's most expensive miss. That judgment gets calibrated against outcome data over a long stretch of production, and no sensitivity slider substitutes for it.

Then the write path. A drafted workflow that executes across the ATS, the HRIS, and the CRM on one approval needs write access to all three, plus approval routing, rollback, and a signed record of every mutation. The in-house build is read-only by design. Read-only is part of why it cleared security review. Adding writes reopens the permission model and the review itself, which makes the write path a rebuild dressed as a feature request.

Then the ROI line on every proposal. Cost of action against cost of inaction requires outcome history joined to intervention history: what it has cost, in dollars, when this condition was caught late or missed. An embedding index holds no counterfactuals. The number has to come from a model trained on what happened after past decisions, and the in-house build was never pointed at that target.

Then the trace. A Decision Trace gets written while the system reasons, capturing the inputs, the intermediate steps, and whatever a human changed, at the moment all of that is live. Query logs record what was asked and what was returned. Reconstructing the why from them half a year later is archaeology, and an audit wants the why as it stood when the decision was made. A system that never recorded its reasoning cannot produce it afterward, which means capture has to sit inside the reasoning path from the first day. Threading it through a finished retrieval pipeline means rewriting the pipeline.

Each of these alone reads like a quarter's project. Together they invert the system. Retrieval stops being the product and becomes one component inside a loop that watches, drafts, routes, and acts, which means the 85% turns out to be a subsystem of the 15%. That inversion is why "we'll build the rest" plans stall. There is no incremental path from a system organized around answering to a system organized around proposing.

Most build vs buy arguments in enterprise AI are cost arguments. We have made ours, with the engineering math, in the Snowflake piece, and I will leave it there, because cost is the weaker frame for this objection. A team that shipped the 85% can fund more engineering. The question in front of that team is whether more of the same architecture ever becomes the 15%, and the answer is no at any budget.

The part that compounds

One more property separates the two systems, and it only shows up after deployment. Use makes the retrieval build's index fresher and makes nothing else better.

The loop learns by construction. Each approval, edit, or decline on a drafted workflow is labeled training data. The model retrains on it inside the customer's own cloud and runs in shadow against the incumbent version before promotion, so the longer the loop operates, the better it gets at knowing what to surface and what to leave alone. That compounding is the deeper argument for the architecture, and it has its own piece: intelligence compounds, the data stays.

Four questions for your architecture lead

If you led the build, you do not need my framing. You need a test you can run on your own system this week.

When did it last bring you something nobody asked it for, and how did it decide you should see it?

If it proposed an action spanning three systems tomorrow, what routes the approval, what executes the writes, and what rolls them back?

Can you pull the full reasoning behind any answer it gave six months ago, including what a human did with it?

Is it measurably better this quarter than last because people used it?

If the honest answer to all four is yes, you built the 15% too, and I would genuinely like to read the engineering blog. If the answers are no, nothing about the build was wasted. Retrieval over joined data is the prerequisite for everything above it, and your team already runs that in production. The open decision is where that team's time goes next: teaching a question-answering system to propose work, or putting a proposing loop on top of joins you already own. How we build the loop is documented on our architecture page. Why your data never leaves your VPC while it runs is its own piece.

The 85% was the hard part to build. The 15% was never going to arrive by building more of it.

Saad Bin Shafiq is the founder of Nodes, serving data-sensitive enterprises. Methodology: Decision Traces.

Technology leaders at large financial institutions keep handing me the build-versus-buy objection in its crispest form: we already built 85% of this in-house.

So here is the straight answer. The 85% is real. The missing 15% is not a feature gap. It is a different architecture.

The 85% is real

The resemblance ends at the question mark.

What the missing 15% is

Three things, each of which sounds like a feature until you try to build it.

Why you cannot bolt it on

Retrieval answers questions. The loop proposes work. Every hard difference between the two systems follows from that one, and none of them attaches to a question-answering system as an add-on.

The part that compounds

One more property separates the two systems, and it only shows up after deployment. Use makes the retrieval build's index fresher and makes nothing else better.

Four questions for your architecture lead

If you led the build, you do not need my framing. You need a test you can run on your own system this week.

When did it last bring you something nobody asked it for, and how did it decide you should see it?

If it proposed an action spanning three systems tomorrow, what routes the approval, what executes the writes, and what rolls them back?

Can you pull the full reasoning behind any answer it gave six months ago, including what a human did with it?

Is it measurably better this quarter than last because people used it?

The 85% was the hard part to build. The 15% was never going to arrive by building more of it.

Saad Bin Shafiq is the founder of Nodes, serving data-sensitive enterprises. Methodology: Decision Traces.

The missing 15% is a different architecture

The 85% is real

What the missing 15% is

Why you cannot bolt it on

The part that compounds

Four questions for your architecture lead

The missing 15% is a different architecture

The 85% is real

What the missing 15% is

Why you cannot bolt it on

The part that compounds

Four questions for your architecture lead