By Greg Miller, Vice President of Marketing & Business Development, Carta Healthcare
Every few months, a new headline declares that artificial intelligence is about to make some category of healthcare worker obsolete. The framing is always the same. A model reaches a benchmark, a demonstration goes well, and the conclusion follows that full automation is now only a matter of time. It is a tidy story. It is also the wrong question.
The useful question is not whether AI can read a chart. Of course it can. The useful question is what the work is actually for, who answers for the result, and what happens when the routine case turns out to be anything but routine. In clinical data abstraction, those questions lead somewhere very different from the replacement narrative.
Clinical registries collect standardized data on patients who share a diagnosis, procedure, or condition. Hospitals submit that data to benchmark outcomes, identify gaps in care, and drive quality improvement. The numbers do not sit in a vacuum. They feed reimbursement, accreditation status, public reporting, and outcomes research. When a value is wrong, the consequence is not a poor recommendation a user can ignore. It is flawed quality data that undermines the registry, distorts a benchmark, and can surface in an audit months later.
That changes the optimization function. A consumer recommendation engine can be wrong a meaningful share of the time and still be useful, because the cost of any single error is trivial. A registry cannot. The work exists precisely because the data has to be defensible, not merely plausible. Any honest conversation about automating it has to start there.
The instinct is to assume that errors shrink as models improve, and that the gap between good AI and good abstraction will eventually close on its own. The trouble is that the hardest part of abstraction is not retrieval. It is judgment.
Consider a real scenario. Clinicians ask a model for a patient's most recent ejection fraction, a measure of heart function. The model returns three values from three documents, each technically correct. Take the most recent and the patient looks fine. An experienced abstractor knows that an earlier, lower reading is the reason the patient is having a procedure at all. The values were all accurate. Only one made clinical sense. No amount of additional accuracy on the extraction step would have produced the right answer, because the right answer required knowing what mattered.
This is the recurring shape of the problem. A model can tell you a patient returned to the operating room based on a post-surgical visit to an affiliated endoscopy suite. A clinician knows the endoscopy suite is not the operating room. The information was retrieved correctly and interpreted wrongly. Abstraction is full of these moments, where the data is right and the meaning is not.
There is a deeper reason the replacement narrative keeps stalling. Model capability and institutional risk tolerance do not advance at the same pace. A model can improve every quarter. The willingness of a health system to hand an auditor a number that no clinician validated does not improve on the same schedule, and for good reason.
Look at how other high-stakes fields resolved the same tension. Modern aircraft can largely fly themselves, yet pilots never left the cockpit, because aviation optimizes for what happens when something unusual occurs, not for the percentage of routine flight that can be automated. Radiology models flag abnormalities with real sensitivity, yet radiologists retain interpretation and legal responsibility. In each case automation expanded dramatically while accountability stayed embedded in the design. Clinical data abstraction belongs in that same category.
So the question worth asking is not whether AI will replace clinical abstractors. Framed that way, the answer is a guess about a date. The better question is how to apply AI so that it accelerates the work without taking on risk a health system cannot defend, and so that it delivers results the organization can actually measure.
Answered that way, the design follows naturally. Let the model do what it is genuinely good at, which is pattern recognition at scale, synthesis across fragmented documentation, and throughput on routine cases. Keep a credentialed clinician at the helm for ambiguity, contextual interpretation, edge cases, and final accountability. This is what Carta Healthcare means by Hybrid Intelligence. It is not AI with a human checking the homework. It is a division of labor built around where each kind of intelligence is strongest.
A system that is right most of the time is impressive in a demonstration and inadequate in a registry, because the value of abstraction concentrates in the cases where being right is hardest. Consider what scale does to that math. A platform reasoning across tens of thousands of surgical cases a year cannot treat the difficult fraction as acceptable losses. A small percentage of misread cases is not a rounding error when each one can move a benchmark, distort a comparison against peer institutions, or surface in an audit a year later. The cases that are easy to get right are also the ones that matter least. The hard ones, where the record contradicts itself and judgment is required, are precisely the ones a buyer should evaluate, and precisely the ones a capability average tends to hide.
This is why the founding insight behind Carta Healthcare has aged well. The company began in 2017 out of frustration with manual data collection at Stanford Children's Hospital, where the people who understood the clinical stakes were the ones doing the painstaking work of turning charts into trustworthy data. That origin encoded a simple conviction. The point of the work is not to produce a number quickly. It is to produce a number a clinician will stand behind. Any approach that optimizes for the first and assumes the second will follow has the priorities reversed.
The replacement framing flattens all of this into a single question about capability, as if the only variable were how good the model is. But two systems with identical extraction accuracy can produce very different outcomes depending on whether anyone is accountable for the cases where extraction is not enough. Capability is necessary. It was never sufficient, and in a regulated, audited environment it never will be.
There is a practical test buried in all of this. When an organization evaluates an abstraction approach, the revealing question is not how the system performs on a clean demonstration set. It is what happens in the room when an auditor asks who validated a particular value and on what basis. A process that can answer that, naming the clinician who stood behind the number and the reasoning they applied, is built for accountability. A process that can only point to a model's output is not, no matter how high the model's average accuracy climbs. The replacement narrative rarely survives contact with that question, because the question is about responsibility, and responsibility is the one thing a model cannot hold.
The replacement story will keep returning, because it is simple and because each new model makes it briefly believable again. The organizations that get the most from AI will be the ones that stop asking whether the machine can do the job and start asking who answers for the result. In healthcare, that question never goes out of date.