Portfolio - Revolutionising Query Experience with Natural Language

Revolutionising Query Experience with Natural Language

About Project

As part of our commitment to enhancing user experiences through innovative technology, we embarked on a project to transform the way users interact with our Query feature. The goal was to leverage Generative AI to enable natural language querying, simplifying the process while maintaining accuracy and efficiency. This case study delves into our journey, challenges, solutions, and the positive impact on our users.

Live Project

Client

Robin AI

Release

June 2023

Role

Product Designer

Tools

Figma

FigJam

Duration

3 Months

When this project began, the industry was shifting fast.

Robin was already a machine-learning-heavy company. We had invested years into proprietary models trained to identify clauses, labels, and structured legal concepts across contracts. AI was not new to us, but LLMs were.

The release of GPT-3 changed expectations almost overnight. Natural language interaction went from experimental to inevitable, and like many companies operating in the AI space, there was real pressure to demonstrate that we were not being left behind.

The challenge was not whether to use LLMs.

It was how to introduce them without compromising trust, accuracy, or privacy in a legal product.

The Problem We Were Actually Solving

Query was already powerful, but it felt technical.

Users had to:

Memorise filters and labels
Think in a SQL-like mental model
Click through multiple steps to answer simple questions

At the same time, early LLM experiments made something very clear:

Token limits were real
Full-contract reasoning was unreliable
Hallucinations were unacceptable in legal workflows

We were not just designing a feature. We were navigating a new class of constraints.

‍

The Decision

Rather than shipping a chat-first or Copilot-style experience, I proposed a hybrid model.

Users express intent in natural language.

The LLM interprets that intent.

Queries are executed through existing labels and filters.

Results remain explainable, fast, and auditable.

This gave users the flexibility they wanted without asking the LLM to answer questions it could not reliably support.

It also meant we could modernise the experience without destabilising the product.

Paths We Explicitly Did Not Take

We explored and intentionally rejected several alternatives.

Full contract ingestion

Blocked by token limits
High latency and cost
Serious privacy and data isolation concerns

Chat-first or Copilot-style interface

Impressive in demos
Difficult to verify
Misaligned with how legal professionals actually work

In a regulated environment, fluent answers without traceability are worse than no answers at all.

Validation and Real-World Constraints

Internally, the idea validated quickly.

I presented the approach to our AI engineers and CTO, and within hours we had a working prototype that proved feasibility. The real challenge came after launch.

Some enterprise clients had extremely large clause and value datasets that still exceeded available context limits. Rather than shipping a degraded experience, we made the call to temporarily disable the feature for those accounts.

It was a conscious trade-off.

Consistency over partial coverage.

Trust over novelty.

As LLM context windows expanded, we were able to re-enable the feature universally without redesigning the system, which reinforced the long-term strength of the approach.

‍

Outcome

65% faster query creation
92% user preference for natural language input
Zero hallucinations when constrained to structured fields

More importantly, users reported higher confidence in results because they could see and refine how queries were constructed.

Broader Impact

This work did not just ship a feature.

It established a pattern for how Robin could responsibly adopt LLMs.

By integrating LLMs alongside an existing ML stack rather than replacing it, the business could:

Roll out AI features without exposing users to early limitations
Launch beta capabilities safely
Build on familiar workflows instead of reinventing them

The same foundations later enabled:

Contract summaries
Clause explanations
Clause library suggestions
Translation and cross-language search

Features that previously felt risky or infeasible became achievable because the system was designed to evolve with the technology.

Reflection

This project reinforced a core belief of mine.

Good AI design is not about how impressive a model looks.

It is about how safely and clearly it fits into real workflows.

By grounding LLMs in deterministic systems, we delivered innovation without sacrificing trust and created a model that aged well as the AI landscape continued to change.

A Critical Realisation

I spent a lot of time following LLM releases and speaking with our engineers about how companies like Microsoft were approaching Copilot. One pattern stood out.

LLMs worked best when they did not operate alone.

Instead of forcing a language model to reason over massive bodies of text it barely understood, successful systems paired LLMs with deterministic, structured foundations.

Robin already had that foundation.

We had: