Your AI Agent Is Lying Because Your Data Is Broken

Most founders their data was never ready for an AI agent after the agent ships. Here is how to check before you build.

You built the agent. It answers confidently. Then a user notices it merged two different customers into one record and quoted the wrong contract terms. You dig in and find the same company listed under four slightly different names in your CRM. The agent was not broken. Your data was.

The audit most founders skip

Salesforce's data team put it plainly in 2025: clean only the data the agent will use, not everything. Scoping matters. A full data cleanup before an AI project is a months-long distraction. A narrow audit of the specific tables and fields your agent will query takes a few days and catches the failures will otherwise surface in production.

Four failure modes show up repeatedly across deployment post-mortems documented by Workday and Alation. Each one has a diagnostic you can run without a data background.

Incomplete records

Pull a sample of 50 to 100 rows from the primary table your agent will read. Count the rows where a critical field — customer email or deal stage — is blank. If more than 10 percent of rows are missing a field the agent needs to answer a question, the agent will either guess or refuse. Both outcomes erode trust faster than a slow rollout would.

IBM's 2024 data cleaning guidance names missing values as one of the most common causes of degraded data usability. The fix is not always to fill the gaps. Tell the agent explicitly which fields are optional and which are required for a valid response.

Duplicate entities

Search your customer or contact table for the same company name with minor variations: "Acme Inc," "Acme, Inc.," "ACME Inc." Workday's 2025 guidance on AI data readiness is specific here — duplicate records break AI work, and the fix requires merging them into a single linked record before the agent runs, not after. An agent sees four versions of the same customer will treat them as four customers. Every aggregate it produces from point is wrong.

I have a bias here: I think most CRM deduplication tools are theater. They surface the obvious duplicates and miss the ones cause problems, which are the near-matches with slightly different addresses or phone numbers. Do the merge manually for your top 20 percent of records. This cohort is where the agent will spend most of its time anyway.

Disconnected systems

Draw a quick map of every data source your agent will touch. Then check whether those sources share a common identifier — a customer ID or an order number links a row in system A to the right row in system B. Alation's 2024 guidance on AI-ready data lists unique identifiers as a first step, not an optimization. Without them, the agent cannot join information across systems reliably. It will either hallucinate a connection or silently drop data from one source.

If your CRM and your billing system use different customer IDs with no mapping table between them, is a broken link. Fix it before the agent touches either system.

Missing event history

Agents need to reason about sequences — "what happened before this customer churned?" or "which actions preceded a successful close?" — require timestamped event logs, not current-state records. If your database stores only the latest status of a record and overwrites the previous one, the agent has no sequence to reason from.

Check whether your primary tables have a created-at and updated-at timestamp on every row. Then check whether status changes log as new rows or overwrite the old ones. If they overwrite, you are missing the history the agent needs to do anything beyond answering questions about right now.

Where the tooling argument runs out

Datagrid and AgentMelt both argued in 2025 AI agents can handle data cleaning and integration on the fly, which reduces the need for a pre-launch audit. This is true for some categories of cleanup — format standardization, basic deduplication at ingestion. It is not true for the four failure modes above. A duplicate entity enters the agent's context window looks identical to a real entity. The agent has no way to flag it as a problem. Nothing in the pipeline signals the output is wrong.

Run the four checks above before you write a line of agent code. The diagnostic takes less time than the first sprint of development. It tells you whether you are building on ground will hold.