AUC 0.647 to 0.735: Data Fusion in Hiring Prediction

In a study of 10,765 hires, personality assessment was the strongest single predictor of production, reaching an AUC of 0.647 on its own. Fusing that personality signal with behavioral scoring and applicant tracking data raised the model to AUC 0.735 on the study sample. Keywords alone reached only 0.558. No single system wins on its own. The signals carry complementary information, and connecting them produces predictive power that none of them reaches in isolation.

Source: "Decision Traces," Saad Bin Shafiq, NODES, 2026. Model comparison on the retroactive cohort, with personality results on the 229 agents who had Predictive Index data. Read it on arXiv.

What the model compared

The team built regularized logistic regression models from each system's features and their combinations, using 5-fold cross-validation with all preprocessing fit inside each training fold to prevent leakage.

Model	Features	AUC
Composite score only	fit score	0.518
Behavioral dimensions	six behavioral traits	0.550
ATS keywords	keywords plus source channel	0.558
All non-personality features	score, traits, ATS	0.575
Personality type only	PI type	0.647
Full fusion	all features plus PI	0.735

Personality was the strongest single signal

Predictive Index type alone (0.647) carried more signal than every non-personality feature combined (0.575). The production spread by type was wide, from Captain at 36.8% down to Promoter at 0.0%, larger than any other single variable in the dataset.

Read the 0.735 correctly

This is the part most vendors get wrong, so it is worth stating plainly. The 0.735 figure is cross-validated on the 229 agents who had personality data. It is a research-sample result, not a claim about live production accuracy. The deployed behavioral score, measured as a binary classifier on 714 agents, runs an AUC of 0.57. That lower number is not a weakness. It reflects how the score actually works.

The score predicts who benefits from speed, not just who produces

The behavioral score is a moderator, not a classifier. It does not mainly predict who will produce. It predicts who converts a fast ramp into production. High-scored agents captured about $114 per day of speed acceleration, roughly 2.8 times the $41 per day captured by low-scored agents. A traditional accuracy metric misses this entirely, because the score's value lives in the interaction between the candidate and the conditions, not in a single yes-or-no prediction. See the speed-to-production findings.

What this means

You do not need demographic or geographic data to reach 0.735. Behavioral and keyword features alone, fused together, get there. Almost all of that signal stays invisible until the systems are connected, which is the entire argument for a decision trace. See how.

Frequently asked questions

What is a good AUC for a hiring model? Context matters more than a single threshold. In this study, keyword screening reached 0.558, personality type 0.647, and full multi-system fusion 0.735 on the evaluable sample.

Does AUC 0.735 mean the system is 73.5% accurate? No. AUC is a ranking measure, not an accuracy percentage, and 0.735 was measured on a 229-agent sample with personality data. The deployed score's binary AUC is 0.57.

Why is the deployed AUC lower than the research AUC? Two reasons. The research figure used personality data available for a subset, and hiring managers already used scores to screen, which compresses the range you can measure. The deployed score's main value is moderating the economics of speed.

What predicted production best? Personality type was the strongest single predictor. Fusing it with behavioral and ATS features predicted better than any system on its own.

From AUC 0.647 to 0.735: How Multi-System Data Fusion Improves Hiring Prediction

What the model compared

Personality was the strongest single signal

Read the 0.735 correctly

The score predicts who benefits from speed, not just who produces

What this means

Frequently asked questions

Related reading

From AUC 0.647 to 0.735: How Multi-System Data Fusion Improves Hiring Prediction

What the model compared

Personality was the strongest single signal

Read the 0.735 correctly

The score predicts who benefits from speed, not just who produces

What this means

Frequently asked questions

Related reading