Klarefi
← Notes from the build

Human-in-the-loop intake automation for regulated teams

Where to put the human, where to put the software, and how to design a queue your underwriters and analysts will actually trust.

Mike Cooper, Founder

Your adjusters spend 30% of their time on admin, not adjudication. Your KYC analysts get contacted by an applicant an average of 10 times before a file closes. 95% of their sanctions alerts are false positives. The bottleneck is not human judgment. It is what you make humans do before the judgment.

Human-in-the-loop is not a slogan. It is a queue design problem. If you put the human in the wrong place, you pay them to retype tax returns. If you put them in the right place, you pay them to decide.

The two layers of intake work

Every regulated intake case has two layers. Confuse them and you waste people.

The first layer is preparation. Collect documents. Extract facts. Find what is missing. Build the audit trail. This is repetitive. It scales with volume. It does not require a license or a credential. Software does it well.

The second layer is judgment. Approve or deny. Interpret risk. Sign off on compliance. Apply legal discretion. This is where your team's experience earns its salary. It does not scale with volume. It scales with skill.

Most "AI" intake tools blur the two. They route a half-finished extraction to a human and call it "human-in-the-loop". That is not a loop. That is a dumping ground.

What software should do, every time

  • Read the packet the applicant submitted, even if it is 200 pages.
  • Extract every required fact with a citation: document, page, quote.
  • Detect what is missing, expired, conflicting, or unsupported.
  • Run the follow-up. Email the applicant. Ask for the schedule. Track the response. Do not put that on your analyst.
  • Surface only what needs judgment, with full context attached.

That is roughly 70% of the cycle time on a typical file. None of it requires a human.

What humans should always own

  • Adverse outcomes. Denials, claim rejections, AML escalations, eligibility refusals.
  • Risk interpretation. A UBO structure that looks unusual. A claimant story that does not match the police report.
  • Legal judgment. Contract clauses, regulatory exceptions, jurisdictional rules.
  • Final sign-off. The signature, the audit attestation, the regulatory filing.

Never automate any of those. Not now, not later. Regulators will hold a human accountable. Build the workflow around that fact.

The three queue states your reviewers actually need

Stop building one queue. Build three.

  • Decide. Everything cited, everything sufficient. The reviewer is reading for judgment, not chasing data.
  • Clarify. Something is missing or conflicting. The system is already asking the applicant. The reviewer can either wait or escalate.
  • Escalate. The model failed, an answer is ambiguous, or the case has a flagged risk pattern. A human takes over end to end.

Three queues. Three SLAs. Three different staffing models. That is how you get analyst time back.

What silent fallback costs you

When extraction fails, most systems return a guess. They do this because their demo metric is "automation rate". A guess in a regulated file is worse than no answer at all. It looks correct. It will pass a quick review. Your reviewer accepts it. Six months later the regulator asks for the source quote and there is none. Now the bank pays the fine and your team rebuilds the audit trail from email threads.

Design against this. Every model output has a citation or it does not exist. Every uncertain case lands in escalate, not decide. Every fallback is visible in the audit log.

How we built it at Klarefi

Every fact has exactly three resolution states. Resolved with cited evidence. Needs input. Failed. No fourth state. No "soft" confidence. The system either has the answer with a source or it does not.

If a model call fails, the case moves to the human queue with the failure reason attached. No retry that hides the original error. No fallback to a templated answer. The audit log records the failure as a first class event.

Reviewers correct facts inline. Every correction is attributed, timestamped, and reversible. The corrected value carries the human attribution, not the model's. A regulator can trace any decision back to a human signature with one click.

Staffing implications you should plan for

When the front door does the chase, your reviewer headcount does not drop to zero. It changes shape. Roughly:

  • Fewer junior reviewers doing document chase and retyping.
  • The same or slightly fewer senior reviewers, now spending most of their time on judgment.
  • One or two new roles: a workflow operator who tunes the facts, the form, and the escalation rules.

Plan the org chart for that shape before you sign the contract. The savings are not theoretical. They are a line in the model. If the vendor will not help you build the model, they have not deployed in regulated production.

The move

Pick one workflow. List every minute a human currently spends on it. Sort those minutes into preparation or judgment. The preparation minutes are the target. The judgment minutes stay with the human. That is the design.