AI That Respects Creators#

Some AI tools feel like they're quietly siphoning from you; others send real viewers and subscribers back to you. The difference isn't a feature you add later — it's a shape baked into the architecture long before anyone writes a privacy policy. Here's what that shape could look like.

The setup#

I spent weeks on a research prototype — a voice assistant for gamers, push-to-talk, "hey, what am I missing on this boss?", answered through your headset in under ten seconds. The prototype's usefulness turned on whether it could answer a wide range of questions about hundreds of games accurately. The model didn't know everything I needed it to know. I had to fill the gap somehow.

The obvious options for filling that gap split into two buckets:

Harvest-time: go fetch knowledge ahead of time, cache it locally, serve from cache at query time. Classic pre-indexing. Fast at query time because you've done the work; high up-front cost; the cache is yours to own and manage.

Runtime: at the moment a question comes in, fetch what you need, answer, and throw the fetched content away. No local cache; the source is canonical; every answer is freshly grounded in whatever's live.

Both have obvious tradeoffs on latency, cost, and freshness. What's less obvious — and what this post is about — is that they have different ethical and legal shapes when the knowledge you need belongs to someone else.

The harvest landmine#

Harvest-time caching of third-party content has a structural problem that no amount of careful implementation fixes.

You think you're doing something clean: "I'll just download the wiki pages for the games I care about, split them into short snippets, build a local search index, and pull up the matching snippets whenever a user asks a question." It's the instinctive reach because it's fast at query time and it matches the standard RAG tutorial.

Here's the thing the standard tutorial doesn't tell you: by the time you've got, say, tens of thousands of pages sitting in a local searchable copy, you've built an unlicensed copy of someone else's work. That copy is now a rival wiki. The fact that you're only using it "internally" doesn't change what it is — you have the same content the wiki has, in a structure you control, being served to answer questions that would otherwise go to the wiki. If the wiki's terms say "no automated scraping" — Fandom's Terms of Use do exactly that — you've broken them. If the wiki is CC-BY-SA licensed (some game wikis are), you owe attribution every time you use a snippet — which you structurally can't do, because the snippets get pasted into the AI's prompt as context and the trail back to the original source gets lost before the model answers.

The copyright and terms-of-service analysis has been done at length elsewhere. What I want to focus on is the architectural diagnosis: keeping a local copy of third-party content is a design that makes honest attribution structurally impossible. You can bolt attribution on top after the fact, but every snippet the AI used to form its answer is a snippet whose source the user never sees, and every answer you render is an answer whose origin is lost in the prompt-building step.

That's not a bug you can patch. It's a structural property of the architecture that no later patch can address.

The runtime alternative#

Runtime fetching flips the shape.

At the moment a question arrives, you go get what you need, you use it to answer, and you throw the fetched content away. Nothing persists. Your local database contains your own work — your schema, your logic, your synthesized answers — not chunks of someone else's. The source stays canonical; your product is an interface to it, not a replica of it.

That changes everything downstream:

Attribution becomes structural. You can't answer without going somewhere to get the answer, and "somewhere" is where the user should be sent if they want more. The citation isn't bolted on; it's the very thing that made the answer possible.
Staleness becomes someone else's problem. If the wiki updated, your next fetch gets the update. No cache invalidation logic; no "oh wait, this game patched three days ago and our chunk is from April."
Copyright/redistribution exposure drops sharply. You're not redistributing, not storing, not indexing. A lot of the classic infringement analysis that attaches to a local mirror of someone else's corpus simply doesn't apply when nothing persists. (This doesn't erase ToS (Terms of Service) risk — automated access still has to be permitted by the source. See the caveat in When to use each.)

The costs are real — fetching at query time is slower than reading from cache, and every query has a per-call dollar cost — but those are latency and money costs, which are problems you can optimize. The harvest-time approach has ethical costs, which you can't optimize away without changing the architecture.

Where I actually landed#

For the prototype, I spent weeks trying to make the harvest approach work. I built a pipeline that generated synthetic knowledge by asking the model what it already knew about each game, then had a judge model validate accuracy, then cached the validated output. The persistent side was clean on copyright/redistribution — no third-party content in the cache, all of it the model's own synthesis — and it worked well enough for some games. (Whether the model's training data is itself legally settled is a separate, ongoing fight; see the NYT and Getty cases later in this post.)

But the floor it hit — the roughly 30% tail where the model just didn't have the knowledge, the same "doesn't-know" bucket the hallucination taxonomy post measures — was a floor I couldn't move with any amount of prompt engineering or judge tuning. I did the experiments. I know.

So for the hard tail, I switched to runtime. When a query comes in, the system checks whether the pre-generated synthetic knowledge is confident about this topic; if it isn't, it fires a transient web search at that exact moment, reads what comes back, and answers from it. The search results never touch my database. The only thing that persists on my side is the synthesized answer I gave the user, which is my own work.

That's the pattern: synthetic knowledge for the easy middle, runtime fetch for the hard tail, nothing persisted from third parties either way. The cache is only for things I authored.

Why I built the harvest version first#

I started with the harvest path because it's the shape standard RAG tutorials teach (LangChain's retrieval-augmented generation tutorial is the canonical example: load, split, embed, store, retrieve, generate) and the latency math points there at query time — reading from a cache is 10–30× faster than fetching live, and one to ten cents per call cheaper. The runtime alternative requires a more complicated pipeline (intent classifier, citation rendering, discard-after-generation auditing, rate-limit handling), and the cost of not doing it is an abstract "future legal exposure" that doesn't show up on this quarter's metrics. So the math at engineering-decision time favors caching, and the legal exposure stays theoretical until a court forces it to be concrete. The lawsuits currently in progress — NYT v. OpenAI, Getty v. Stability AI — are the pressure points that may eventually move that calculus. If it moves, it'll likely move from the architecture layer up.

Why this composes with creator attribution#

The deeper principle here — and this is the part I want creators to understand — is that an architecture that has to go somewhere fresh every time to answer hard questions is an architecture that has a structural incentive to send users back to those sources.

If I cached your wiki, your YouTube channel, your guide, your strategy writeup, I'd have no reason to link to you. My answers would come out of my cache; the user would never know you existed. That's the dominant pattern in AI content products right now, and it's the structural reason creators and publishers have increasingly taken to the courts (the NYT and Getty cases cited above are the most visible examples).

If I have to fetch fresh every time, you're part of the answer loop by construction. A citation chip appears alongside the answer. The user clicks it and you get a real, counted view. If they like what they see, they subscribe, and the subscribe button is one voice command away. Your content isn't a replica sitting on my server — it's the canonical source I'm routing users to.

A version of this principle is what the prototype pointed at: short structured facts extracted at runtime, cached only as pointers-back-to-source (video ID, timestamp, channel ID), never the creator's words themselves, rendered alongside the answer with a prominent citation chip and a click-through that opens the original at the exact moment of the claim. The creator was the canonical source; the tool was a discovery funnel that sent players who were stuck on gameplay problems to the creators who solved those problems. Not a replacement for the creators. A way of finding them. (Implementing this against a platform like YouTube means going through its official Data API under its terms, not scraping — the architecture describes what the product does; the ToS dictates how the fetch is allowed to happen.)

Runtime fetch makes it possible to build AI that routes users back to original creators. Pre-caching their content into your own index shuts that door.

When to use each#

Some concrete guidance, because I know how easy it is to take "ethics" as an excuse to never cache anything:

Harvest-time is fine when:

The data is your own (user preferences, local game library, save files).
The data is licensed for redistribution (open-source datasets, CC0 facts tables, your own model's synthesized outputs).
The data is factual-only with no expressive content (Wikidata-style knowledge graphs, prices, dates).
You have explicit written permission from the source.

Runtime is the right call when:

The source is a third party whose content you don't own, and either the ToS permits automated reads or you're going through an official API on its own terms.
The source is rapidly changing (live-service games, news, prices).
Attribution to the source is part of the product's value (AI that points users to original creators).
You want to reduce redistribution/copyright exposure — runtime fetch doesn't leave a local copy of someone else's work sitting on your server.

One caveat worth being honest about: runtime fetch is not a ToS workaround. Many platforms (YouTube, Reddit, Twitter/X, Fandom) restrict automated or "indirect" use of their content regardless of whether you cache. If the ToS forbids automated access, runtime is just as prohibited as harvest — the fix there is an official API with its own terms, a licensing deal, or picking a different source. What runtime fetch solves is copying (you don't keep a copy) and crediting (the citation is built into the flow). What it doesn't solve is whether you're allowed to access the source in the first place — that's still governed by the source's terms.

The hybrid I ended up with:

Harvest what you can author yourself (my synthetic knowledge pipeline).
Fetch at runtime what you can't (the 30% tail where the model doesn't know).
Never harvest-cache third-party content just because it'd be faster.

That last rule is the load-bearing one. Latency is a real cost. It's also never a sufficient reason to copy someone else's work.

Walking the talk on this site#

Two implementation details that mirror the same pattern. /llms.txt — a curated markdown index (emerging convention) that tells LLM-based readers what to cite for any given query, instead of letting them guess. View it at jamesalaniz.dev/llms.txt. /robots.txt opts out of training-only crawlers (GPTBot, CCBot, Google-Extended, anthropic-ai) while keeping AI search/citation bots (ClaudeBot, OAI-SearchBot, Perplexity, etc.) allowed. The asymmetry is deliberate and consistent with the argument above: be a cited source, not silently absorbed into training data.

Both are a single text/markdown file served at the root.

The pattern, stated plainly#

If you're building on top of someone else's content, let the architecture carry the attribution. Make it structurally impossible to answer without routing the user back to the source. Cache what you author; read from the source for the rest. Cache pointers, not prose.

It's slower than the alternative. It's also the difference between a product that respects the ecosystem it relies on and one that quietly siphons from it.

Much of today's AI content landscape does the second thing. It doesn't have to. It's a choice that gets made architecturally long before anyone writes a privacy policy about it.