The Time I Thought "More Data" Would Fix My AI#

A story about why AI products sometimes fail in surprising ways — and what "helping the AI" looks like.

I spent a stretch of weeks on a research prototype — a push-to-talk voice assistant you could ask questions about the game you were playing. "Where's the next boss fight?" "What's the best weapon for this area?" That kind of thing.

The AI I use is smart, but it doesn't know everything. For popular games like Elden Ring, it knows the world inside and out — Elden Ring was one of only three games out of thirty in my test pipeline that cleared every quality check on the first try, with no retries needed. For a newer or more obscure game, it starts guessing — confidently inventing things that sound right but aren't. In my research, asking about Returnal — which sits in the "doesn't-know" bucket at 0.35 confidence — produced a boss named "Archonact," which isn't a real character in that game. The AI just made it up because it felt like the kind of name that game would have.

The fix, on paper, is simple: give the AI a list of real names from the game, and tell it to only use those. This is called "grounding." In my pipeline, the "list" came from external structured sources like Wikidata and the "only use those" rule was a single line in the prompt: if you're not certain a name is real, omit it rather than invent. It's a common technique for reducing hallucinations. More grounding, more accuracy. Right?

Not exactly. Here's what happened when I tried.

The experiment#

I wanted to make the AI more accurate across a lot of games — not just the ones it already knows. So I needed a source of game-specific names (bosses, locations, items) to feed it.

I picked two sources:

Wikidata. A community-edited knowledge graph of facts about everything. When a game has enthusiasts curating its page, you can pull a clean list of characters, locations, and objects from it. The downside: if nobody's bothered to curate the game, Wikidata has nothing.
Steam achievements. Most Steam games have achievements, and many reference specific bosses or locations by name ("Defeat Margit," "Cleared Stormveil"). Hypothesis: I could extrapolate entity names out of achievement titles, which would give me a near-universal-coverage backup for the games Wikidata didn't cover.

My reasoning: Wikidata where editors had written it; Steam achievements to fill the gap everywhere else. Between the two, I'd always have something to feed the AI.

I ran 30 games through the pipeline with both sources available and measured how often the AI's output passed a strict quality check.

The results surprised me:

What the AI had access to	Games that passed	%
Just the AI's own knowledge (separate baseline run)	8 of 24	33%
Wikidata only	15 of 30	50%
Wikidata + Steam achievements	16 of 30	53%

Wikidata helped on the games it covered — 33% to 50% is a real, meaningful improvement. Adding Steam achievements on top barely moved the needle: one extra game out of 30, a difference too small to mean anything.

The real test was whether Steam would fill in on the games Wikidata didn't cover. It didn't. On the handful of games where achievements were the only grounding the AI had, the pass rate stayed well below what Wikidata games were getting — Steam wasn't filling the gap on the games that actually needed help.

The hypothesis didn't hold#

My hypothesis going in was that achievement titles named enough in-game entities to be useful as a backup grounding source. That turned out to be wrong for reasons I hadn't thought through.

When my pipeline asks the AI to generate information about a game, it fills in a specific schema: a list of bosses, a list of zones, a list of items. Each slot is asking for a specific kind of thing — a boss name, a location name, an item name.

Wikidata has exactly that structure. When a game has a Wikidata page, it has entries labeled as characters, places, and objects. The AI can pull those and slot them in directly. "Margit the Fell Omen" goes in the boss slot; "Stormveil Castle" goes in the zone slot. Clean match.

Steam achievements are a different animal. Most of them aren't entity-naming at all — they're titles like "Victory Royale," "100% Completionist," "First Blood." These are marketing-written names for player accomplishments, not for things in the game world. The minority that do reference entities ("Defeat Margit") arrive as a single string with no label telling the AI which schema slot the entity belongs in — "Margit" could plausibly be a boss, a character, or a location depending on the game.

So the AI saw a pile of hundreds of achievement titles per game, most of them irrelevant to the schema, and couldn't reliably extract the minority that did name real entities. Steam achievements didn't align to any actual entity data the schema was asking for, and that's why they didn't move the needle as hypothesized — even though they were broadly available.

The premise I hadn't questioned was that broad availability was the important property. Most Steam games have achievements, so as a coverage strategy they looked near-universal. But coverage alone isn't useful if the content inside isn't shaped like what you need. Universal ≠ useful.

The broader lesson#

I kept testing for a while, because one experiment isn't enough. I tried three other variations:

A wider net on Wikidata. I loosened my queries, tried to extract more entries from each game's page. Result: the games that had rich pages still had rich pages; the games with empty pages still had empty pages. No number of clever queries manifested data that wasn't there.
A different database (IGDB). Games industry catalog, comprehensive coverage. Result: IGDB is great for cataloging games but doesn't have the in-game entity detail I needed. Near-zero character entries across every game I tested.
Web search. Anthropic's AI tools support live web search. Result: I got roughly 80× more names than any database — great — but the AI still couldn't clear the quality bar, because now it was hallucinating relationships between the real names instead of the names themselves.

Each experiment showed a different flavor of the same underlying problem. The question I was asking — "what's the most comprehensive source I can feed this AI?" — was the wrong question. The right question was "what shape does the AI actually need, and which source produces data of that exact shape?"

Why this matters#

This experience changed how I think about grounding data for AI products.

My instinct going in was the common one: more reference data should mean more accurate answers. The experiment above doesn't prove that instinct wrong in general — it shows that in this pipeline, on these games, with this schema, the gains from adding a second source were too small to call real. What I took away wasn't "big datasets are bad." It was a narrower lesson: the shape of the reference data has to match the shape of the slots the AI is trying to fill, or the extra data does nothing useful.

That reframed the question I was asking. "What's the most comprehensive source I can feed this AI?" turned out to be the wrong question for my problem. The better one was "which source natively produces data of the exact shape the schema needs?" — and in my pipeline the answer was one carefully-picked source, not a pile of everything.

What I did instead#

The lesson I ended up taking from all this: for the hard cases, harvest-time grounding — pre-loading reference data ahead of time so the AI can lean on it later — wasn't going to save me. No matter which database I reached for, the games my AI didn't know weren't going to be rescued by better lookups. (The three-source negative-result sequence is written up formally in the grounding schema-alignment research post; the broader architectural case is in runtime vs. harvest.)

So I went the other direction. Instead of pre-loading lots of data in advance, I built a system that looks things up at the moment you ask a question. If the AI is confident about Elden Ring, it answers immediately from what it already knows. If someone asks about a game or a boss it's not confident on, it pauses and searches the web right then, reads the results, and answers — citing the source directly in its response so you can click through and see where the information came from.

That approach — cache what you author yourself, and go to the source for the rest only when someone asks — is slower. It costs a few cents per search (one to ten cents, depending on which search provider you use). But it's honest in a way that "pre-load everything into a giant cache" isn't, because it sends you back to the sources that actually have the information, instead of trying to be a replacement for them.

That second property — the sending you back to the sources part — turned out to be the most important property of the whole system. It's what makes the AI a guide to the good stuff on the web, instead of a replacement for it.