A quarter of businesses that have tried AI report limited or no results. If you're in that group, the problem almost certainly wasn't the technology. Here's what went wrong and how to fix it.
Tyler Gibbs
Lead AI Engineer at LexisNexis Risk Solutions
About a quarter of businesses that have tried AI will tell you the same thing: it didn't really work. Not a disaster, not a catastrophic failure — just limited results that didn't justify the time and money. The tool got used a few times, quietly fell out of rotation, and now "AI" is a word that lands a little differently in the room than it used to.
If that's where you are, I want to say something clearly before we go any further: the problem almost certainly wasn't the technology. AI works. The foundation models available right now are genuinely capable of handling document extraction, classification, workflow routing, and a dozen other tasks that consume enormous amounts of time in law firms, insurance carriers, and compliance teams. The technology is not the failure point.
The approach is.
Having built over 20 production AI automations at LexisNexis Risk Solutions and for clients across legal and insurance, I've watched the same five patterns produce the same disappointing result over and over. Not bad luck. Not bad teams. Predictable, avoidable mistakes that you can diagnose before the next attempt.
Here's what actually happened.
The most common failure mode I see is scope that's too large from day one. Not slightly too large — "boil the ocean" large. A firm decides to automate its entire document review process. A carrier wants to overhaul intake, claims routing, and adjuster assignment simultaneously. A compliance team wants an AI solution that covers contract review, regulatory tracking, and audit prep in one go.
The logic feels sound in the pitch meeting. If we're going to do this, why not do it right? Why build something partial?
Here's why: because a partial thing that ships is worth infinitely more than a comprehensive thing that doesn't. Large scopes create large timelines. Large timelines create organizational fatigue. By month four of a six-month project, stakeholders have moved on, the team running it has changed, and the original problem you were solving has shifted. Even if the system eventually gets built, nobody is as bought in as they were at the start.
I've seen firms that could have had a working document extraction system in three weeks spend nine months in implementation purgatory chasing a complete solution. They didn't get the complete solution. They got nothing.
The organizations that have AI working in production picked one workflow. One document type, one routing decision, one extraction task. They shipped it, measured it, and built from there. The ambition was still there — they just serialized it instead of parallelizing it.
Off-the-shelf AI tools have improved dramatically. They're genuinely useful for a range of general tasks. They are not designed to handle legal, insurance, or compliance work reliably, and the gap between what a demo suggests and what happens with real production data is where most implementations fall apart.
Insurance endorsement language is not standard prose. Legal filings reference statutes, case law, and jurisdictional nuances that a general-purpose model may handle inconsistently or confidently get wrong. Compliance documents involve regulatory frameworks that evolve, have specific terminology that matters, and cannot tolerate hallucinated citations.
I've seen teams run a ChatGPT prompt on a coverage dispute document in a demo and get impressive-looking output. Then they ran it on fifty real claims in production and discovered that the model was fabricating policy numbers, misreading exclusion clauses, and occasionally producing output that looked authoritative and was factually incorrect. Not always. Just enough to make the output untrustworthy, which made the tool unusable.
The problem isn't that these tools are bad. They're excellent at what they're designed for. The problem is deploying them without the domain-specific context, validation logic, and integration architecture that makes them reliable for high-stakes professional work. A general-purpose tool bolted onto a specialized workflow produces general-purpose unreliability on work where reliability is the entire point.
This failure mode is quieter than the others. The system gets built. It works reasonably well. Then nobody uses it.
Not because they're being obstinate. Because it wasn't designed to fit into how work actually gets done.
If a paralegal's job is to open a document in iManage, extract key dates and parties, and enter them into the matter management system, and you give them an AI tool that lives outside iManage and requires them to upload files manually to a separate interface and then re-enter the results themselves, you've added steps to a process you were supposed to simplify. The human calculates, consciously or not, that doing it manually is faster than using the tool.
This is why "no rip-and-replace" isn't just a positioning statement for me. It's the technical requirement that determines whether the thing actually gets used. If the output doesn't land in the system people already live in, adoption is close to zero. The tool becomes something people demonstrate during vendor evaluations and quietly ignore on a Tuesday morning when there's real work to do.
Integration into the existing workflow isn't a nice-to-have. It's the condition under which the tool becomes real.
"Use AI more" is not a success metric. Neither is "explore AI capabilities" or "increase AI adoption across the team." These are goals in the same way that "eat better" is a goal — technically directional but practically unmeasurable.
Without a specific, numeric definition of success, you can't tell if the implementation worked. You can't tell if the investment was worth it. You can't make a defensible case to leadership for the next initiative. And you can't detect when something stops working, because you never established a baseline for what "working" looked like.
The implementations that produce clear results started by defining those results before anything was built. Reduce intake processing time from four hours to thirty minutes. Drop the error rate on coverage limit extraction from 8% to under 1%. Cut the time to prepare a regulatory audit package from three days to one. These are specific, testable, falsifiable. You can build toward them, measure against them, and know when you've hit them.
If you can't tell someone exactly what number you were trying to move, and what it moved to, the project didn't have a real definition of done. That's not a criticism — it's a diagnostic. The next attempt needs to start with a number.
This one is harder to say, but it accounts for a lot of failed implementations.
Large consulting firms sell AI transformation projects by putting senior partners in front of your leadership team. The pitch is polished, the case studies are real, and the confidence is warranted — by the people in that room. Then the engagement starts, and the day-to-day delivery is handled by a team of junior consultants who are learning the technology on your timeline and your budget. The people who understood your problem sold the engagement. The people building the solution may be meeting your domain for the first time.
On the other end of the spectrum, development shops build impressive demos that can't make it to production. They deliver something that works in a controlled environment with clean data, hand it over, and disappear. Three months later you're maintaining a system nobody fully understands, it breaks when it encounters a document type that wasn't in the original test set, and the vendor relationship that could have fixed it ended with the contract.
Neither of these is a vendor problem, exactly. They're structural consequences of how those delivery models work. The alternative is working with someone where the person who scoped the project is the person building it, using your actual data, accountable to the results in production, not just the demo.
After everything above, the pattern that produces working AI in production is straightforward — not simple, but straightforward.
Pick one workflow. The most painful one, not the most ambitious one. The one where someone on your team could describe the problem in a single sentence and tell you exactly what correct output looks like.
Use real data from day one. Not a sanitized sample. The actual messy documents that hit your system on a Wednesday afternoon, including the edge cases and the scanned faxes and the forms where someone filled in the wrong field. If the system can't handle those, you need to know before go-live, not after.
Build the integration into the existing workflow before anything else. If the output doesn't land in the system your team already uses, treat that as an unresolved problem, not a follow-up task.
Define success as a specific number. Hours saved. Error rate reduced. Documents processed without human touch. Something you can put in a sentence and verify.
Keep the scope fixed. Every "while we're at it" is a future project, not a current requirement.
Build in a human checkpoint for the uncertain cases. Not because the AI can't be trusted, but because the 3% of cases that are genuinely ambiguous deserve human judgment, and having a clean escalation path is what makes the other 97% trustworthy.
If your first AI implementation didn't deliver what you expected, you know something the 40% of businesses that haven't tried it yet don't know. You know what doesn't work for your organization. You know what the gap between a demo and production actually looks like. You know that "AI" as a category is not the same as "this specific implementation scoped correctly for this specific workflow."
That's expensive knowledge, but it's real knowledge. The organizations that do nothing because they're waiting for AI to mature are not in a better position than you. They're behind, and they don't know what they're missing.
The question is what you do with what you learned. The approach that produces results is available. It's not complicated. It just requires starting smaller than feels right, defining success more precisely than feels necessary, and working with someone who has shipped things that have to work every day — not just in the demo.
If you're ready to try again with a different approach, I'm happy to talk through what a scoped, realistic implementation looks like for your specific workflow. The first conversation costs nothing and you'll leave with a clear picture of what's actually worth building. Reach out through the contact form and we'll go from there.
More insights on AI engineering and production systems
The technical case for AI is settled. The hard part is internal. Here's how to build the business case, address the real objections, and get your organization moving.
Most enterprise AI projects never reach production. The reasons are consistent, predictable, and largely avoidable if you know what to look for before you start.
After building over 20 production AI systems for legal, insurance, and risk teams at LexisNexis Risk Solutions, here are the lessons that only show up after the demo is over: what actually breaks, what surprises everyone, and why most AI projects die before they ever matter.
We help legal, insurance, and compliance teams implement AI that saves time and reduces risk. Let's talk about your needs.