Building an ERP Chatbot With Claude, No RAG

Every AI chatbot demo looks the same. Someone types a question. The system searches a document store. RAG (Retrieval-Augmented Generation) pulls relevant chunks. The LLM generates an answer from those chunks.

We looked at this approach for SimpleGrid and decided it was fundamentally wrong for our use case.

Here is why we skipped RAG, what we built instead, and why it works better.

Why RAG is wrong for structured operational data

RAG solves a specific problem: giving an LLM access to information it was not trained on. You take your documents, break them into chunks, embed them into vectors, and retrieve the most relevant chunks when a user asks a question.

This works beautifully for unstructured content. Internal wikis. Policy documents. Knowledge bases. If someone asks "What is our return policy?" and the answer lives in a PDF somewhere, RAG finds it.

But SimpleGrid's data is not unstructured. It is highly structured. It lives in PostgreSQL tables with defined schemas, typed fields, relational integrity, and state machines. Your inventory is not a paragraph in a document. It is a number in a column, tied to a specific entity, at a specific point in its lifecycle.

When your warehouse manager asks "How much 304 stainless do we have in stock?" the answer is not in a chunk of text. It is the result of a SQL query against the inventory projection table. RAG would try to find a text fragment that mentions stainless steel inventory. It might hallucinate a number. It might pull from a month-old report. It would almost certainly be wrong.

What we built instead: Tool Use

Claude, the AI model we use, supports a feature called Tool Use (also called Function Calling). Instead of searching for text, the model selects from a set of pre-built functions, decides which one to call, and passes the right parameters.

Here is what happens when someone types "How much 304 stainless do we have in stock?"

1. The message hits Claude with a system prompt that includes the client's SG Schema, including what types of things exist, what fields they have, and what queries are available.

2. Claude does not search for text. It recognizes this is an inventory query. It selects the `get_inventory` function. It passes parameters: material_type = "304 stainless".

3. The backend receives the function call, executes a parameterized SQL query against the client's isolated database, and returns the result.

4. Claude receives the structured result (quantity: 2,400 sheets, location: Warehouse B, last updated: 2 hours ago) and formats it as a natural language response.

The user sees: "You have 2,400 sheets of 304 stainless in Warehouse B. Last updated 2 hours ago."

No vector search. No chunk retrieval. No hallucinated numbers. A real query against real data, with a real answer.

How context works without RAG

The question is always: how does Claude know what functions to call?

The answer is the Knowledge Level. Every SimpleGrid client has a configuration that describes their business. What types of things exist. What fields each type has. What queries are available. What write operations are permitted.

When a user starts a session, Claude receives the relevant slice of that client's Knowledge Level as context. Not the entire database. Just the schema: what exists, what can be queried, what can be written.

This is far more efficient than RAG. Instead of searching through thousands of document chunks for maybe-relevant context, Claude has a precise, structured map of the client's operation. It knows exactly what functions exist and what parameters they expect.

How write operations work

Read queries are straightforward. Write operations are where it gets interesting.

When the warehouse manager types "Received 200 sheets of 16-gauge steel from Midwest Supply," Claude needs to:

1. Identify this as a write operation (material receipt).

2. Find the matching PO (open PO from Midwest Supply for 16-gauge steel).

3. Generate a structured form: { entity: PO-4521, action: receive, quantity: 200 }.

4. Present the form to the user for confirmation.

5. On confirmation, SG Engine processes the action: checks rules, validates permissions, writes the event, fires triggers.

Claude does not write directly to the database. It fills in a structured form. The user confirms. The SG Engine handles the rest, including all the rule checking and event writing.

This is the critical safety layer. The AI suggests. The human confirms. The engine enforces.

Model routing

Not every interaction needs the same level of AI capability. We use three tiers:

Fast tier (Haiku): Simple status checks, quick lookups, intent classification. "What is the status of PO-4521?" Haiku handles this in under a second.

Standard tier (Sonnet): Complex queries, analytical questions, multi-step reasoning. "Which vendor has the highest rejection rate this quarter?" Sonnet handles the query construction and result interpretation.

Heavy tier (Opus): Multi-variable planning, cross-entity analysis, capacity optimization. "Based on current orders and inventory, what should we procure this week?" Opus handles the reasoning.

The classifier (Haiku) reads the input and routes to the appropriate tier. Most interactions hit the fast tier. Cost stays low. Response time stays fast.

Why this matters for adoption

The end result is a chatbot that feels like talking to the sharpest person on the floor. Not because it is searching documents. Because it has a precise map of your operation and direct access to your live data through well-defined functions.

The warehouse manager does not need to learn an interface. He types what happened. The system understands, executes, and confirms. The behavior is identical to sending a WhatsApp message - the same behavior your team already uses 50 times a day, except now the data goes into the system instead of a chat that nobody reads.

That is the adoption unlock. Not a better interface. The same behavior, connected to a real system.

SimpleGrid's AI chatbot uses Claude's Tool Use, not RAG. Structured queries against live data. No hallucinated numbers. No stale documents. Real answers from your real operation.