Pharma AI Catches Duplicate Patients and Site-Level Fraud Before Clinical Trials Are Compromised

Clinical trial acceleration has become the primary success metric for pharma AI. The most valuable applications are not general-purpose automation tools but systems that protect trial integrity and compress timelines measured in years and billions of dollars. At AbbVie, that principle guided the development of an anomaly detection framework now embedded in the company's governance workflow and running across all trials at enterprise scale.

Shruti Kaushal, Senior Data Scientist on the Experimental AI team at AbbVie, leads the development of scalable ML systems for clinical trial and medical analytics. Her work spans anomaly detection, text classification, and vector-embedding systems for clinical research and patient safety. Before AbbVie, she built an AI-driven drug discovery framework at the Wyss Institute at Harvard, achieving 300% faster throughput on protein-target prediction. She holds a Master of Science in Data Science from Columbia and a Master of Science in Mathematics from IIT Delhi.

"Anything that helps accelerate our clinical trial timelines, because the patients are at the center of everything that a pharma company does. A lot of the times when people ask me what experimental AI is and how do you really measure success, it's against these timelines," says Kaushal.

Two engines, one framework

The anomaly detection framework operates with two distinct engines. The first focuses on site-level anomalies, comparing data from each clinical trial site against all others to determine whether something is out of the ordinary. The system catches more than outright fraud. "This framework is able to detect not just extreme cases like fraud. It can also detect data entry errors where everybody has entered hemoglobin in the expected range, but another site ended up adding a zero at the end. These things help with data quality, not just risk management."

The framework produces quarterly health reports across all trials, giving review teams actual signals to investigate rather than forcing them to manually sort through hundreds of sites. "Our review teams are not sitting with 100 sites, thinking which one should I review. They now have actual signals they can drill down into."

The second engine handles duplicate patient detection, a problem that threatens both patient safety and statistical power. So-called "professional patients," motivated by compensation or other factors, enroll in multiple trials simultaneously. That dilutes statistical power and can lead to trial dismissal.

Kaushal says the key advantage is catching these patients before they receive treatment. "We're not detecting once they have been dosed, because then intervening is still compromising their safety. Current industry standards wait until the entire trial is finished and all data is collected. That's when they find they have duplicates. The damage is done."

Getting data right first

Kaushal says the first question her team asks before evaluating any use case is whether the data exists and whether it meets FAIR principles: findable, accessible, interoperable, and reusable. Regulatory restrictions also limit what problems can be addressed.

For organizations looking to prepare their data infrastructure, Kaushal points to two priorities: building a semantic layer that harmonizes all incoming data to a single ontology, and establishing a central data warehouse that links clinical data across formats. "At any given point, I don't have to go to external vendors or do tedious research to figure out what stands for what or which terms need to be converted into which units."

Validation is not optional

Every AI model at AbbVie requires a spec document and must be audit-ready. Validation is built into the project plan from the start, not treated as a later-stage checkpoint. Kaushal says this regulatory pressure, often seen as a constraint, actually makes adoption easier. "Human in the loop is almost necessary to have another human verify the signals for the entire process to be audit-ready. That's why it's easier for us to have our internal stakeholders adopt."

That mandatory rigor is also why Kaushal argues the AI bubble will not burst in pharma. "In pharma, we're not building new tech. We're applying existing tech to novel use cases. And our efficiency metric is not about how many hours we save in a day. It's about how much we've been able to accelerate our clinical trials." She points to AstraZeneca's stated goal of increasing regulatory submissions from one to six per year as evidence that the payoff is real.

The regulatory standards that pharma must follow force robustness from the start. "We will not be able to ship a model and say it works unless it's validated to a point where it meets all regulatory standards and guidelines. Being a data scientist in pharma AI is probably the best position to be in right now if you're more interested in application rather than creation."

All articles

Pharma AI Catches Duplicate Patients and Site-Level Fraud Before Clinical Trials Are Compromised

Shruti Kaushal, Senior Data Scientist at AbbVie, explains how an anomaly detection framework catches duplicate patients and site-level risks before they compromise clinical trial integrity.

In pharma, we're not building new tech. We're applying existing tech to novel use cases. And our efficiency metric is not about how many hours we save in a day. It's about how much we've been able to accelerate our clinical trials.

Shruti Kaushal

Shruti Kaushal

Two engines, one framework

Getting data right first

Validation is not optional

All articles

Enterprise AI

Pharma AI Catches Duplicate Patients and Site-Level Fraud Before Clinical Trials Are Compromised

Shruti Kaushal, Senior Data Scientist at AbbVie, explains how an anomaly detection framework catches duplicate patients and site-level risks before they compromise clinical trial integrity.

In pharma, we're not building new tech. We're applying existing tech to novel use cases. And our efficiency metric is not about how many hours we save in a day. It's about how much we've been able to accelerate our clinical trials.

Shruti Kaushal

Shruti Kaushal

Two engines, one framework

Getting data right first

Validation is not optional

Related Stories

Are Companies Prematurely Laying Off the Specialists They Need to Steer AI?

The AI Boom Is Hitting Infrastructure Limits, Forcing A Shift To In-Platform Deployment

While AI Demos May Succeed, Deployments Fail Without Architecture-Led Strategy

Rushing Into AI Leaves Enterprises Paying a Long-Term Complexity Tax

Upfront Requirements Discipline and Leadership Alignment Set the Standard For Enterprise IT Success