Back to Blog

Checking the AI is not enough

Courts are sanctioning lawyers who checked their AI output. The problem isn't the review - it's that citation checking and verification are not the same thing. Here's what the best firms are doing differently, and why it starts before a single artefact is produced.

Courts are sanctioning lawyers who checked their AI output. The problem isn't the review - it's that citation checking and verification are not the same thing. Here's what the best firms are doing differently, and why it starts before a single artefact is produced.

In Australia and the US, professional and financial sanctions are growing for lawyers as AI-generated fabrications are reaching courts. In Oregon this year, two attorneys were fined a record $110,000 after their submissions were found to be rife with AI fabrications. In Australia, a Victorian lawyer was referred to the Legal Services Board and stripped of his principal practice rights after submitting AI-generated citations he had not verified.

What's striking about these cases is not that lawyers skipped the review. Most didn't. They checked. It was that the checking process was unable to catch what the AI had quietly got wrong.

Research from Wharton published this year puts a name to what's happening. Cognitive surrender: when an AI-generated answer is adopted with minimal scrutiny. The study found that 80% of people follow AI output even when it is wrong - and that faulty AI makes people perform worse than having no AI at all, with confidence in accuracy increasing regardless of accuracy.

For legal practitioners, this creates a specific problem. Citation checking - the dominant verification practice - is a syntactic check. It tells you whether a case exists. It doesn't tell you whether the AI missed a document, misread a date, skipped contradicting evidence, or built its answer on an incomplete picture of the matter.

Effective verification isn't a review step at the end, it needs to be integrated into a workflow built from the start.

Since our May release built specifically around this problem, our customers in Australia and the US have been doing exactly that. Here's what it looks like in practice.

1. Building verification into the workflow from day one

The firms getting the most from Mary aren't using it to produce output and then checking it. They're using it to structure how they approach a matter before a single artefact is produced.

"It was even smart enough at times to tell me there's no supporting evidence for this allegation. It's just an allegation at this point."

The shift is going from not just understanding whether the evidence exists - but to how deep it is. Top firms are now mandating verification workflows that include:

That last point is what we call negative space. Not just what the AI reviewed, but what it didn't. What's in the evidence set that never surfaced? What allegations have no supporting document behind them? This is the layer citation checking doesn't reach.

2. Forensic triage - earlier than you think

Our new bank statements capability allows customers to undertake detailed transaction analysis in a clean, collaborative workspace - and to cross-reference that analysis against other evidentiary documents in the matter.

In one recent wills and estates matter, a customer identified financial wastage of millions of dollars through elder abuse. Patterns surfaced that would have taken weeks to find manually. Conflicts between financial records and affidavit accounts became visible quickly.

Bringing forensic triage forward - treating it as a starting point rather than a late-stage exercise - is giving firms a meaningful edge in how they understand a matter before it develops.

3. Consistency across class actions and large-volume matters

In class actions and large-volume matters, the risk of inconsistency scales with volume. What gets caught in matter 12 gets missed in matter 47.

Mary's template-based drafting generates structured memos from your evidence base - assessing, by category, the relevance of each claimant's evidence and drafting individual memos automatically. Page-level citations and clickable source views mean the confidence in moving decisions forward is grounded in the evidence, not assumed from the output.

4. Requests or notices for admission and production - built from the evidence up

Especially for litigation teams, Mary is helping teams draft requests or notices for admission and production based on surfaced issues - and respond to them. When the requests are grounded in the evidence from the start, the quality of the record shows it.

The time savings are real, but they don't come at the cost of rigour:

"Sometimes what looks like a three-hour job, I'm doing in half an hour, plus the review time - which is good. Don't get me wrong."

5. Deposition and witness examination preparation

Exposing weaknesses in opposing counsel's case and drafting source-cited questions - whether for witness examination or depositions (US) - is as simple as a prompt in Mary.

Teams are coming to these moments better prepared. Custom chronologies generate questions substantiated directly by the evidence. The result is faster identification of positional weaknesses and better-informed strategy before the room.

What this means in practice

The firms avoiding sanctions aren't the ones checking harder. They're the ones who've changed what verification means - from a citation review at the end to a structured workflow from the start.

The questions worth asking of your current process:

If any of those are gaps, the May release was built for you.

The Mary team is happy to show you how customers are building these workflows - and what they're finding.

Book a call with our team