Skip to main content
Home/Expertise/Chatbots

Artificial Intelligence

Chatbots.

We move from the chatbot that recites FAQs and frustrates the customer to an agent that actually resolves, customer-facing or employee-facing. Clean knowledge base, answers anchored in official sources, handoff to human when appropriate and measured resolution rate.

The context

Why it matters today more than ever.

Chatbots have been disappointing for ten years with closed menus and useless answers. What changes now are the models behind them: Claude, ChatGPT and Gemini reason, understand context and follow instructions. A well-built chatbot today is an agent capable of resolving, not just responding. And ROI has gone from promise to metric.

Trend · 01

From the chatbot that recites FAQs to AI that acts

Modern chatbots don't just retrieve text: they reason, execute actions (returns, identity verification, system queries) and decide when to hand off to a human. Leading platforms converge on this model.

Trend · 02

RAG and source anchoring lower hallucinations 3×

With a knowledge base structured as a single source of truth, hallucinations drop from 0.34% to 0.11%. The difference between a satisfied customer and a customer who abandons the brand.

Trend · 03

No-code construction with Custom GPTs and Claude Projects

A functional internal chatbot is now built in hours, not months. It reduces the cost of entry and democratizes the use case for small and medium-sized companies.

The problem

Where your system always breaks.

Symptoms vary from company to company, but the patterns repeat. These are the four structural pains we find in practically every chatbot project we audit.

01

Chatbot that recites FAQs without reasoning

Keyword matching with FAQs and canned responses. Without understanding context, without combining information, without distinguishing intent. The customer asks the same thing in four different ways until giving up and asking for a human agent.

Impact

Resolution rate below 20%, CSAT below 3.5/5 and an equally saturated customer support team.

02

Hallucinations without source anchoring

The bot invents policies, prices or processes that don't exist. Without RAG or anchoring against an official source, 0.34% of responses are hallucinations. With sensitive or financial data, the risk is legal, not just reputational.

Impact

71% of customer support leaders identify hallucinations as one of the three main governance risks. Invented refunds are a real thing.

03

No handoff to human when it fails

The bot insists on answering even when it doesn't know. The customer goes in circles until abandoning. And when they finally reach the human, there's no context handoff, the customer repeats the same thing three times.

Impact

Customer churn and NPS of −3 points with pure AI versus +1 with well-designed hybrid flow.

04

Eternal pilots that never reach production

64% of companies piloted a chatbot with the ability to act in 2026, only 27% have a real production channel. And only 10% operate it maturely. The bot lives in a test environment indefinitely, without a decision to scale or close it.

Impact

Frozen investment, demotivated team and no learning applicable to the next attempt.

We launched a chatbot, customers complain and in the end they ask for a human agent. And when we ask what percentage it resolves, nobody has looked.

, What we hear in discovery calls

The cost

What it costs to leave it unfixed.

85%

of customers leave the brand after a first contact not properly resolved, a badly built chatbot is exactly that.

Source · Zendesk CX Trends 2026

An uncomfortable conclusion

A badly built chatbot costs more than not having one. The companies that win don't have «a bot», they have a bot that the customer prefers to the human agent in routine cases. The difference between $7.40 and $0.62 per resolution multiplies for every ticket the bot leaves unresolved.

The solution

A system, not a tool.

The difference between a chatbot that delivers and one that frustrates isn't in the model (Claude, ChatGPT, Gemini), it's in six construction pillars we apply in every project. Well designed, a bot resolves 60% of level 1 tickets with CSAT equal to or higher than the human. Badly designed, it resolves 15% and annoys the rest.

  1. 01

    Use cases prioritized with ROI

    Which question does the bot resolve, and which it doesn't. Simple FAQs (password reset, order status) get resolved 70%+. Cases with emotional load (complaint, dispute) go straight to the human. Well-defined scope is 50% of success.

  2. 02

    Clean and living knowledge base

    Single source of truth: policies, products, processes, FAQs. Structured in Notion, Confluence or help center and maintained with an assigned owner. Without this, the bot invents. With this, hallucinations drop from 0.34% to 0.11%.

  3. 03

    Base model + RAG / source anchoring

    Claude, ChatGPT or Gemini as the reasoning engine. On top, RAG against the knowledge base, the model only answers with what's documented, and cites the source when it can. Source anchoring = no hallucinations.

  4. 04

    Human handoff flow

    When the bot doesn't know, it hands off, with context, without the customer repeating. Automatic triggers: low confidence, negative sentiment, sensitive intent. Intelligent handoff turns pure AI (NPS −3) into hybrid flow (NPS +1).

  5. 05

    Brand voice and experience

    System instruction with tone, vocabulary and limitations. Interface integrated with the site, not a generic widget. The bot extends the brand, doesn't break it. 72% of customer support leaders identify it as critical, and most don't work on it.

  6. 06

    Metrics and continuous improvement

    Resolution rate by intent, CSAT, handoff percentage, hallucination rate and main unresolved questions. Weekly dashboard. Monthly iteration: update the knowledge base, adjust prompts, expand resolved scenarios.

The tools

4 platforms, one technical decision.

«To build a chatbot there are four layers to choose, base model as reasoning engine, dedicated platform that packages it, channel where it's deployed and set of metrics. These are the four tools we work with most to cover the first two layers: the foundation of the solution.»

Claude

Best instruction-following, the bot respects the system instruction in long conversations. Lower hallucinations, extensive context window and MCP for integrations. Ideal when precision and consistency matter more than public distribution.

Ideal for

Internal chatbots with extensive knowledge base, regulated sectors (finance, health, legal) or bots where consistent voice and precision are critical. The first choice when building on API + RAG.

ChatGPT

Custom GPTs with Actions (API calls), native web browsing and code execution. Public distribution (GPT Store) or private to the team. Multimodal out of the box. Widest ecosystem and best at letting the bot execute actions.

Ideal for

Chatbots that need to execute actions (check an order, process a return), bots distributable to the team or the public, cases where the bot must call external APIs. Best option for «chatbots other humans use».

Gemini

Native integration with Google Workspace (Gmail, Drive, Calendar, Docs). Real multimodal from day one and extremely long context. Useful when the knowledge base lives in Drive or when the bot must process images and video in addition to text.

Ideal for

Teams on Google Workspace, internal chatbots whose knowledge base is already in Drive and Docs, and cases with multimodal input (incident photos, product videos).

WhatsApp Business API

The channel where the B2C customer already is. Better open and response rate than email or web chat. Integration with any model (Claude, ChatGPT, Gemini) through providers like Twilio. Essential for customer-facing chatbots in Spanish-speaking markets.

Ideal for

B2C customer support chatbots, lead capture, conversational commerce and after-sales. When the customer prefers to chat from WhatsApp rather than open your website.

03Our methodology

The process.

A sequence proven in 200+ companies. Each phase has deliverables before moving to the next, and is developed in collaboration with your internal team.

01

Diagnostic

We audit existing processes and the current stack. We map bottlenecks and optimization opportunities to ensure the success of the following phases.

02

Planning

We define target architecture, rollout plan, roles, and metrics before getting into the weeds.

03

Build

We execute in short iterations with your team. We create, adapt, and integrate with your existing tools.

04

Rollout

We start with a test and expand after validation. We train your team so adoption feels natural.

05

Follow-through

We measure and listen to feedback throughout so the result truly becomes yours.

Results

What changes when it works.

A well-designed chatbot shows up in three dimensions: the customer resolves doubts without waiting for a human agent, the support team works only on complex cases where it adds real value, and leadership sees measurable operational metrics: resolution rate, CSAT and cost per resolution.

41–58%

Level 1 resolution

Median of 41.2% and upper quartile of 58.7% (Zendesk CX Trends 2026). In simple intents (password, order status) it reaches 70%+. The difference: clean knowledge base and well-configured source anchoring.

3.5–8×

ROI per euro invested

Sector average of 3.5× for every euro invested in a chatbot, the best deployments reach 8× (McKinsey / MIT Sloan). The difference: mature deployments improve metrics in 87% of cases versus 62% of the rest.

4.10/5

Bot CSAT

Pure AI 4.10/5 versus 4.3/5 for human agent (Intercom 2026). With well-designed hybrid flow, the difference shrinks to 0.05 points. The customer barely distinguishes when the human handoff works.

0.11%

Hallucinations with source anchoring

Versus 0.34% without RAG or anchoring. 3× fewer errors when the knowledge base is the single source of truth. Critical in regulated sectors and in any answer that includes numbers.

The bot resolves before the customer even thinks of opening a ticket. And when it doesn't know, it hands off with context to a human. Before, the customer repeated the same thing three times, not anymore.

, Customer Support Operations Lead, B2C e-commerce

Let's talk.

Book a free intro session so we can understand where you stand and how we can help. No strings attached.

FAQ

Frequently asked questions

How much does it cost to deploy an enterprise AI chatbot?

Three ranges by ambition. (1) Simple FAQ bot over static documentation: 3,000-8,000 EUR deployment + 50-200 EUR/month variable OpenAI/Claude costs. (2) RAG over live documentation (products, knowledge base): 8,000-20,000 EUR + 200-800 EUR/month. (3) Agent with tools (queries databases, executes actions, integrates with CRM/ticketing): 15,000-40,000 EUR + 500-2,500 EUR/month depending on volume. What impacts cost most: number of integrations with existing systems, not model sophistication. To validate a use case before major investment, we always recommend a 4-week pilot on a bounded sub-domain.

How long does it take to deploy a generative AI chatbot?

Functional MVP in 3-4 weeks. Robust production in 8-12 weeks. The MVP covers the happy-path use case with basic prompt engineering and RAG over a single source. Robust production adds: automated evals (so the bot doesn't hallucinate on real client questions), quality monitoring, fallbacks when it doesn't know, integration with your stack (CRM, ticketing, Slack), response governance, and compliance. What lengthens the plazo most: not the AI itself, but connecting the bot to real company systems with correct permissions.

What's the difference between a traditional chatbot and a generative AI one?

A traditional chatbot follows a predefined decision tree: if the user steps off the flow, the bot fails to understand and frustrates them. A generative AI one understands natural language, maintains context across turns, reasons about unanticipated cases, and can consult proprietary documents via RAG (retrieval-augmented generation). The traditional one scales poorly as complexity rises: each new variant requires rewriting the tree. The AI scales with content, add documents, it gets better. Trade-off: AI can hallucinate and costs are variable, so it demands active governance that traditional bots don't need.

Can I train a chatbot with my company's documents?

Yes, via RAG (retrieval-augmented generation), no need to retrain the model. The flow: we index your documents (PDFs, Confluence, Notion, Drive) into a vector database; when a user asks, the bot searches for relevant passages and passes them to the model as context before generating the answer. Far cheaper than fine-tuning, real-time updates when docs change, and traceability: the bot can cite the source. Works well for internal FAQs, product support, employee onboarding, knowledge base search. Doesn't work for real-time reasoning over numeric data, for that you need an agent with tools.

OpenAI, Claude, or Gemini, which one for my enterprise chatbot?

It depends on the use case, not the hype. Claude (Anthropic): complex reasoning, long documents (200k+ token context), best compliance/safety. GPT-4 / GPT-5 (OpenAI): most mature tool ecosystem, best function calling, plugins. Gemini (Google): multimodal cases (image, audio, video) and competitive price-rendimiento. For RAG over B2B corporate documentation, Claude usually wins on reasoning quality over sources. For automation with tools/APIs, GPT maintains an edge. Buena práctica: model-agnostic architecture (LangChain, LlamaIndex) so you can switch providers by pricing/funcionalidad without rewriting. Locking yourself into one provider in 2026 is technical debt.

How do you measure the ROI of an enterprise chatbot?

Three concrete metrics with pre-deployment baseline. (1) Tickets deflected from human support: % of queries resolved without escalation, translated to hours saved × agent cost per hour. (2) Sales response time: speed-to-lead typically drops from hours to seconds, with measurable conversion impact (sector studies: +30-50% qualified lead rate). (3) Post-interaction CSAT: 1-question survey after each conversation, if it drops below 70%, the bot is hurting more than helping. Calculate ROI at 30-60-90 days vs baseline. If at 90 days it doesn't show positive ROI on at least one of the three, rethink the use case.

Definition

An enterprise chatbot is a conversational system that automates customer or employee interactions using generative AI (Claude, GPT, Gemini) over proprietary data. We deploy RAG chatbots, agents with tools, and traditional bots depending on the use case, from 4-week use-case validation to robust 12-week production deployments.