Home- Portfolio -
Revolutionising Query Experience with Natural Language

About Project

As part of our commitment to enhancing user experiences through innovative technology, we embarked on a project to transform the way users interact with our Query feature. The goal was to leverage Generative AI to enable natural language querying, simplifying the process while maintaining accuracy and efficiency. This case study delves into our journey, challenges, solutions, and the positive impact on our users.

Category
Product Design
Client
Robin AI
Release
June 2023
Role
Product Designer
Tools
Figma
FigJam
Duration
3 Months

When this project began, the industry was shifting fast.

Robin was already a machine-learning-heavy company. We had invested years into proprietary models trained to identify clauses, labels, and structured legal concepts across contracts. AI was not new to us, but LLMs were.

The release of GPT-3 changed expectations almost overnight. Natural language interaction went from experimental to inevitable, and like many companies operating in the AI space, there was real pressure to demonstrate that we were not being left behind.

The challenge was not whether to use LLMs.

It was how to introduce them without compromising trust, accuracy, or privacy in a legal product.

The Problem We Were Actually Solving

Query was already powerful, but it felt technical.

Users had to:

  • Memorise filters and labels
  • Think in a SQL-like mental model
  • Click through multiple steps to answer simple questions

At the same time, early LLM experiments made something very clear:

  • Token limits were real
  • Full-contract reasoning was unreliable
  • Hallucinations were unacceptable in legal workflows

We were not just designing a feature. We were navigating a new class of constraints.

The Decision

Rather than shipping a chat-first or Copilot-style experience, I proposed a hybrid model.

Users express intent in natural language.

The LLM interprets that intent.

Queries are executed through existing labels and filters.

Results remain explainable, fast, and auditable.

This gave users the flexibility they wanted without asking the LLM to answer questions it could not reliably support.

It also meant we could modernise the experience without destabilising the product.

Paths We Explicitly Did Not Take

We explored and intentionally rejected several alternatives.

Full contract ingestion

  • Blocked by token limits
  • High latency and cost
  • Serious privacy and data isolation concerns

Chat-first or Copilot-style interface

  • Impressive in demos
  • Difficult to verify
  • Misaligned with how legal professionals actually work

In a regulated environment, fluent answers without traceability are worse than no answers at all.

Validation and Real-World Constraints

Internally, the idea validated quickly.

I presented the approach to our AI engineers and CTO, and within hours we had a working prototype that proved feasibility. The real challenge came after launch.

Some enterprise clients had extremely large clause and value datasets that still exceeded available context limits. Rather than shipping a degraded experience, we made the call to temporarily disable the feature for those accounts.

It was a conscious trade-off.

Consistency over partial coverage.

Trust over novelty.

As LLM context windows expanded, we were able to re-enable the feature universally without redesigning the system, which reinforced the long-term strength of the approach.

Outcome

  • 65% faster query creation
  • 92% user preference for natural language input
  • Zero hallucinations when constrained to structured fields

More importantly, users reported higher confidence in results because they could see and refine how queries were constructed.

Broader Impact

This work did not just ship a feature.

It established a pattern for how Robin could responsibly adopt LLMs.

By integrating LLMs alongside an existing ML stack rather than replacing it, the business could:

  • Roll out AI features without exposing users to early limitations
  • Launch beta capabilities safely
  • Build on familiar workflows instead of reinventing them

The same foundations later enabled:

  • Contract summaries
  • Clause explanations
  • Clause library suggestions
  • Translation and cross-language search

Features that previously felt risky or infeasible became achievable because the system was designed to evolve with the technology.

Reflection

This project reinforced a core belief of mine.

Good AI design is not about how impressive a model looks.

It is about how safely and clearly it fits into real workflows.

By grounding LLMs in deterministic systems, we delivered innovation without sacrificing trust and created a model that aged well as the AI landscape continued to change.

A Critical Realisation

I spent a lot of time following LLM releases and speaking with our engineers about how companies like Microsoft were approaching Copilot. One pattern stood out.

LLMs worked best when they did not operate alone.

Instead of forcing a language model to reason over massive bodies of text it barely understood, successful systems paired LLMs with deterministic, structured foundations.

Robin already had that foundation.

We had:

  • A mature labeling system
  • ML models trained on clause detection
  • Structured metadata that represented contracts far more efficiently than raw text

The opportunity was not to replace our system with an LLM.

It was to let the LLM translate human intent into something our system already understood.

There is more

Explore other Work