Project

LLM wiki

Andrej Karpathy’s idea for a wiki maintained by a language model — taken well past the sketch into something I actually run, and something that has quietly changed how I work.


The seed

The starting idea is Karpathy’s: instead of a person writing and maintaining a wiki, the person curates the raw sources and asks the questions, and a language model does the reading, summarizing, cross-referencing, and bookkeeping. You supply judgment and material; the model supplies tireless synthesis. It’s a lovely idea in a paragraph. Making it hold up over months is the actual work.


How we extended it

What we’ve built around the seed is a set of disciplines that turn a clever demo into a knowledge base you can still trust after thousands of edits. The point of almost every one of them is the same: never rely on the model remembering to do the right thing — make correctness mechanical and checkable.

A typed, enforced schema. Every page declares what it is and what subject area it belongs to, in structured front matter, and a linter refuses to let a page drift out of shape. The structure is what lets tooling reason about the wiki at all, rather than treating it as a pile of prose.

An immutable source layer. Raw material — the things sources actually said — is capture-only. The model can add to it and build on it, but it can’t quietly rewrite the record, so history can’t be edited out from under you. On top of that sits strict provenance: a page both declares its sources and links them inline, and the linter flags any citation that has gone dangling.

Staleness that is computed, not remembered. A page is flagged automatically when something it depends on has changed underneath it — when a source it cites carries a newer timestamp than the page itself. “Manager” pages that summarize a set of others are flagged when any of their children move. Shared assumptions live in one canonical place, and every document that leans on them is tracked, so revising an assumption lights up every downstream page that’s now out of date. Timestamps even carry sub-day resolution, so a busy editing session keeps its true order.

Contradiction as a first-class event. When new information disagrees with what’s already written, the convention is never to silently overwrite. The disagreement is preserved as data — a dated correction, a struck-through-and-superseded block, a side-by-side comparison — because the fact that two sources disagree is often more useful than either claim alone.

A clean line between behavior and fact. How the model should act is kept separate from what’s true about the world, so working preferences never get tangled up with the content they operate on.

And the piece I’m most pleased with — a retrieval test harness. The whole thing is worthless if the right page can’t be found for a given question, so there’s an automated evaluation that poses real questions to a fresh model and measures whether it actually opens the pages that hold the answer. The same instrument both grades the wiki and drives how it’s organized: when retrieval misses, that’s a bug in the structure, and it gets fixed.


Why it’s been a game-changer

The effect has been much larger than I expected. I curate and I ask; the system reads, connects, and keeps itself consistent — and it compounds. Every source I add makes the next question easier to answer, and the model does the patient cross-referencing I would never keep up with by hand.

It has stopped being a thing I maintain on the side and become the substrate I work from. It’s hard to overstate how much that changes an ordinary day — the knowledge is finally in one place, current, cross-referenced, and answerable, and it stays that way without me policing it.


Links