A brand new benchmark examine launched by Vals AI means that each legal-specific and basic massive language fashions at the moment are able to performing authorized analysis duties with a stage of accuracy equaling or exceeding that of human attorneys.
The report, VLAIR – Legal Research, extends the earlier Vals Legal AI Report (VLAIR) from February 2025 to incorporate an in-depth examination of how varied AI merchandise deal with conventional authorized analysis questions.
That earlier report evaluated AI instruments from 4 distributors — Harvey, Thomson Reuters (CoCounsel), vLex (Vincent AI), and Vecflow (Oliver) — on duties together with doc extraction, doc Q&A, summarization, redlining, transcript evaluation, chronology era, and EDGAR analysis.
This follow-up examine in contrast three authorized AI methods – Alexi, Counsel Stack and Midpage – and one foundational mannequin, ChatGPT, in opposition to a lawyer baseline representing conventional guide analysis.
All 4 AI merchandise, together with ChatGPT, scored inside 4 factors of one another, with the authorized AI merchandise performing higher total than the generalist product, and with all performing higher than the lawyer baseline.
The best performer throughout all standards was Counsel Stack.
Main Distributors Did Not Take part
Sadly, the benchmarking didn’t embrace the three largest AI authorized analysis platforms: Thomson Reuters, LexisNexis and vLex.
In keeping with a spokespeople for Thomson Reuters and LexisNexis, neither firm opted into taking part within the examine. They didn’t not say why.
vLex, nevertheless, initially agreed to have its Vincent AI take part within the examine, however then withdrew earlier than the ultimate outcomes have been revealed.
A spokesperson for vLex, which was acquired by Clio in June, mentioned that it selected to not take part within the authorized analysis benchmark as a result of it was not designed for enterprise AI instruments. The spokesperson mentioned vLex can be open to becoming a member of future research that match its focus.
Overview of the Examine
Vals AI designed the Authorized AI Report back to assess AI instruments on a lawyer-comparable benchmark, evaluating efficiency throughout three weighted standards:
- Accuracy (50% weight) – whether or not the AI produced a substantively appropriate reply.
- Authoritativeness (40%) – whether or not the response cited dependable, related, and authoritative sources.
- Appropriateness (10%) – whether or not the reply was well-structured and may very well be readily shared with a consumer or colleague.
Every AI product and the lawyer baseline answered 210 questions spanning 9 authorized analysis varieties, from confirming statutory definitions to producing 50-state surveys.
Key Findings
- AI Now Matches or Beats Attorneys in Accuracy
Throughout all questions, the AI methods scored inside 4 share factors of each other and a mean of seven factors above the lawyer baseline.
- Attorneys averaged 71% accuracy.
- Alexi: 80%
- Counsel Stack: 81%
- Midpage: 79%
- ChatGPT: 80%.
When grouped, each legal-specific and generalist AIs achieved the identical total accuracy of 80%, outperforming attorneys by 9 factors.
Considerably, for 5 of the query varieties, on common, the generalist AI product supplied a extra correct response than the authorized AI merchandise, and one query kind the place the accuracy was scored the identical.
“Each authorized AI and generalist AI can produce extremely correct solutions to authorized analysis questions,” the report concludes.
Even so, the report discovered a number of cases the place the authorized AI merchandise have been unable to supply a response. This was because of both technical points or deemed lack of accessible supply knowledge.
“Pure technical points solely arose with Counsel Stack (4) and Midpage (3), the place no response was supplied in any respect. In different instances, the AI merchandise acknowledged they have been unable to find the best paperwork to offer a response however nonetheless supplied some type of response or clarification as to why the obtainable sources didn’t help their capacity to offer a solution.”
- Authorized AI Leads in Authoritativeness
Whereas ChatGPT matched its legal-AI rivals on accuracy, it lagged in authority — scoring 70% to the authorized AIs’ 76% common. The distinction, Vals AI mentioned, displays entry to proprietary authorized databases and curated quotation sources, which stay differentiators for legal-domain methods.
“The examine outcomes help a standard assumption that entry to proprietary databases, even when composed primarily of publicly obtainable knowledge, does lead to differentiated merchandise.”
- Jurisdictional Complexity Stays Exhausting for All
All methods struggled with multi-jurisdictional questions, which required synthesizing legal guidelines from a number of states. Efficiency dropped by 11 factors on common in comparison with single-state questions.
Counsel Stack and Alexi tied for greatest efficiency on these, whereas ChatGPT trailed intently.
- AI Excels at Sure Duties Past Human Velocity
The AI merchandise outperformed the lawyer baseline on 15 of 21 query varieties — typically by vast margins when duties required summarizing holdings, figuring out related statutes, or sourcing current caselaw.
For instance, AI responses have been accomplished in seconds or minutes, in comparison with attorneys’ common 1,400-second response latency (~23 minutes).
And the place the AI merchandise outperformed the people on particular person questions, they did so by a large margin – a mean of 31 share factors.
- Human Judgment Nonetheless Issues
Attorneys outperformed AI in roughly one-third of query classes, notably these requiring deep interpretive evaluation or nuanced reasoning, comparable to distinguishing related precedents or reconciling conflicting authorities.
These areas underscore, because the report put it, “the enduring fringe of human judgment in advanced, multi-jurisdictional reasoning.”
Methodology
The examine was carried out blind and independently evaluated by a consortium of legislation companies and teachers.
Every participant answered similar analysis questions crafted to reflect real-world lawyer duties. Evaluators graded each response utilizing an in depth rubric (which the report consists of).
The AI distributors represented have been:
- Alexi – authorized analysis automation startup (based 2017).
- Counsel Stack – open-source authorized data platform.
- Midpage – AI analysis and brief-generation software.
- ChatGPT – generalist massive language mannequin (GPT-4).
Vals AI cautioned that the benchmark covers basic authorized analysis solely, not duties comparable to drafting pleadings or producing formatted citations.
And, because the report notes, “Authorized analysis encompasses a variety of actions … however there’s not all the time a single appropriate reply ready upfront.”
Backside Line
The VLAIR – Authorized Analysis examine reinforces what many within the authorized tech business have already noticed, which is that AI methods – each generalist and domain-trained – are quickly closing the standard hole with human authorized researchers, notably in accuracy and effectivity.
But, the sting stays with legal-specific AIs in trustworthiness and supply quotation, suggesting that proprietary knowledge entry is the following aggressive frontier.
For legislation companies, company authorized departments, and AI distributors alike, the examine serves as a clear benchmark – a uncommon apples-to-apples comparability — for understanding the place right now’s fashions shine and the place human experience stays indispensable.
Even so, the examine is weakened by the failure of the three largest AI authorized analysis platforms to take part. This isn’t the fault of Vals AI, however it leaves one questioning why the massive three all opted out.
