Simon Willison laid out his LLM predictions for 2026. I found myself thinking about how they land differently when your AI tools are processing claims documents and privilege logs instead of pull requests. Here's what actually matters for law firms and insurance carriers.
Founder of Grayhaven
Founder of Grayhaven
Simon Willison recently did a podcast run-through of his LLM predictions for 2026, covering the one-year, three-year, and six-year horizons. If you work in AI, his take is worth an hour of your time. He's one of the clearer thinkers on what's actually happening in this space versus what's being marketed.
I found myself doing a different exercise as I listened: filtering each prediction through the question a managing partner at a 200-person law firm or a VP of claims at a mid-size carrier actually needs answered. Not "is this technically interesting?" but "does this change what I should be doing before the end of the year?"
Most AI predictions are written for people building AI. These are filtered for people deploying it.
Willison's most pointed near-term prediction is that the AI ecosystem is heading toward a major security incident. Something he calls a "Challenger disaster" for coding agents: a prompt injection worm moving through package repositories, or some equivalent event that breaks the normalization of deviance that's built up around unsafe AI practices.
The normalization of deviance framing is the part that stuck with me. It's borrowed from disaster analysis. The pattern: a system has a known vulnerability, nobody exploits it for a while, teams get comfortable, the vulnerability gets treated as theoretical. Then something happens and everyone acts surprised.
Sound familiar? It should. It's exactly what's happening with AI tooling in professional services right now.
Law firms and insurance carriers are deploying AI tools built on ecosystems — third-party packages, model providers, document pipelines, API integrations — that haven't been security-reviewed with anything like the rigor you'd apply to a system handling privileged communications or claims data. Not because the teams are reckless. Because nothing bad has happened yet, and "nothing bad has happened yet" is not a risk framework.
The question I'd ask about any AI tool you're currently running: what happens if an adversarially crafted document hits your extraction pipeline? A malicious insurance form, a contract with hidden text, a deposition transcript that contains content designed to manipulate the model's output. What does your system do? Does it log it? Does it flag it? Does it route the output somewhere with elevated trust?
Most teams don't know the answer. That's the deviance. The normalization is accepting that not knowing is fine because nothing has gone wrong yet.
I'm not predicting doom. I'm saying the window to build secure AI practices before an incident forces your hand is probably measured in months, not years. Better to answer these questions now, when you have time to be thoughtful, than after something goes wrong for someone in your space.
On the three-to-five year horizon, Willison is bullish on sandboxing maturing rapidly. Containers and WebAssembly are making it much safer to run untrusted code. The same category of work applies to running AI in high-stakes environments: isolated execution, tighter permission boundaries, better audit trails.
For regulated industries, this matters a lot. One of the most common objections I hear from compliance officers and IT teams when evaluating AI implementations is a version of: "We can't run this until we understand exactly what data it touches and can prove it to a regulator."
That's a legitimate concern, not a stalling tactic. And it's becoming easier to satisfy.
The firms that are winning right now aren't the ones that waited for perfect safety guarantees. They're the ones that deployed with appropriate guardrails, logged everything, kept humans in the review loop, and built institutional knowledge about what their AI tools actually do. That institutional knowledge compounds. The compliance team that has been running AI-assisted regulatory monitoring for eighteen months has a dramatically clearer picture of their exposure than the team that's been watching from the sidelines.
Waiting for the sandboxing story to mature before touching AI is a reasonable instinct applied to an unreasonable timeline. The tools available today, deployed carefully, are already safer than a lot of the manual processes they're replacing.
Willison raises the Jevons Paradox question for software engineering. The historical pattern: when a resource gets cheaper, demand for it tends to increase rather than decrease. Steam engines made coal burning more efficient, which led to more coal burning, not less. The efficiency gain created enough new applications that total consumption went up.
His observation: the evidence so far suggests AI will do the same thing for software development. Not eliminate programmers. Massively expand the category of software that gets written.
The same dynamic almost certainly applies to legal and insurance work.
The question being debated in law firm management meetings right now is some version of: "If AI can do what a second-year associate does, why do we need second-year associates?" It's the wrong question. Or rather, it's the right question asked too narrowly.
The better question is: if your associates can now do the work of three associates, what new categories of work become economically viable for your clients? What matters does a mid-market company retain counsel for today that it can't afford, but could if the cost structure changed? What risk management work gets done by insurance carriers if their adjusters can handle twice the volume?
AI won't shrink the market for skilled legal and insurance professionals. It will create pressure to differentiate between professionals who use AI to expand what they can offer and those who don't. The productivity gains accrue to whoever captures them first. That's a competitive dynamics problem, not a headcount problem.
Concretely: the firms that are thinking about AI as "how do we do the same work with fewer people" are missing the opportunity. The firms thinking about "how does this change what we can profitably offer" are the ones worth watching.
Willison's one-year prediction with the highest confidence: AI-generated code quality becomes undeniable. He mentioned that his own hand-coded output has dropped to a single-digit percentage of what he ships. The rest is AI-generated, reviewed, and refined.
This matters to you because the AI tools you're evaluating, buying, and deploying were built with this. Not someday. Now.
It cuts both ways. Sophisticated AI implementations are getting built faster and cheaper than they were 18 months ago. The productivity gains in development mean the fixed-price projects that would have taken eight weeks two years ago take three weeks today. That's good news if you're scoping work.
It also means the barrier to entry for building AI tools has dropped significantly, which means more vendors in your inbox, more demos, and proportionally fewer teams that have thought carefully about what happens after the demo. The code quality has improved. The judgment about what to build, and how to deploy it safely in a regulated environment, has not improved at the same rate.
The question to ask any vendor: who built this, and do they understand the regulatory environment you operate in? A technically clean implementation that doesn't account for privilege, data residency, or audit trail requirements isn't an asset. It's a liability with a good demo.
Not panic. Not wait.
The practical response to these predictions is the same advice I've been giving since I started Grayhaven: start with controlled, scoped implementations now, while the landscape is still forming, so you build institutional knowledge before it becomes competitively essential.
A few specific things worth doing in the next 90 days:
Audit your current AI exposure. List every AI tool your firm or team is currently using, including the ones individual lawyers or adjusters have adopted on their own. For each one, ask: what data does it touch, where does that data go, and what happens if it produces a wrong output? If you can't answer those questions, that's the first thing to fix.
Build one scoped workflow properly. Pick the document-heavy task that consumes the most time with the least judgment required. Claims intake triage. Contract clause extraction for standard agreements. Regulatory change summaries. Build a proper implementation: audit logs, human review checkpoint, defined error behavior, clear success metric. Do it right once and you'll have a template for everything else.
Start asking vendors hard questions about security. Before the incident Willison is predicting makes it obvious that you should have. What's the data handling model? What happens to the documents you process? Is there a penetration test? What's the prompt injection exposure?
The prediction I'd add to Willison's list: the firms that have 12 months of production AI experience when the first major legal or insurance AI security incident hits will be in a very different position than the firms starting from zero. The former group knows what they're doing. The latter group is suddenly trying to evaluate risk under pressure.
Start building the experience now. It's not expensive, and the alternative is expensive in a different way.
If you want to think through what this looks like for your specific environment, I'm happy to talk. There's a contact form on this site, or reach out directly. The conversation is free and there's no pitch at the end.
More insights on AI engineering and production systems
The most valuable AI implementations I've built are genuinely boring. That's not a criticism — it's the whole point. Here's why proven, predictable AI beats cutting-edge every time for law firms and insurance carriers.
The technical case for AI is settled. The hard part is internal. Here's how to build the business case, address the real objections, and get your organization moving.
A quarter of businesses that have tried AI report limited or no results. If you're in that group, the problem almost certainly wasn't the technology. Here's what went wrong and how to fix it.
We help legal, insurance, and compliance teams implement AI that saves time and reduces risk. Let's talk about your needs.