Research notes

Longer-form writeups — methodology, data, and results behind the shorter posts.

Data Appendix: Runs, Cohorts, and MethodologyApr 23, 202618 min read
The run-by-run A/B structure behind the research posts — ungrounded baseline runs, grounded runs, per-harvest verdicts for the 31-harvest snapshot (30 distinct games), and the exact metrics each post cites.
Universal Coverage Isn't Enough: Grounding Sources Must Align With Schema SlotsApr 23, 202610 min read
A counterintuitive result from trying to make an LLM-authored synthetic game wiki more accurate: the source with the widest coverage didn't help on the games that needed it. Three consecutive negative results across Steam, Wikidata, and IGDB in this research pointed at the same diagnosis — and the implication for retrieval-augmented generation over a structured schema, at least in my setup, is 'fit matters more than size.'
Two Kinds of Hallucination, a Discrete Confidence Cliff, and What It CostsApr 23, 202610 min read
Empirical data from 200+ harvests across 30 distinct games run through an LLM-generated reference-content pipeline splits the fabrications I observed cleanly into two buckets with different responses to intervention, sitting beneath a judge-side confidence distribution that clusters at a handful of discrete values.
Runtime vs. Harvest: A Design Pattern for RAG Over Third-Party ContentApr 23, 202611 min read
Where you fetch and when you fetch can be two of the most consequential architectural decisions in retrieval-augmented generation. Latency, cost, and infra are the easier axes to reason about; the attribution axis is the one this post focuses on, because it turns on a single binary architectural property — does the pipeline persist third-party content? — where the others are parameter tuning.