Work

MAGI: a deliberative multi-LLM council

Three LLMs answer in parallel, then — optionally — review each other's anonymised responses and revise. A rapporteur writes the synthesis; the lowest-confidence model writes the minority report. Open-source Python, eight decision methods, live at magi-council.org.

2026 ·Side project · open source (MIT) ·Data

Side project. Live at magi-council.org; source at jason-chao/MAGI (MIT); package on PyPI (pip install magi-core).

Why another multi-LLM system?

Most multi-agent LLM research — AutoGen, Du et al.’s debate papers, MAFBench — is benchmarked on tasks with a known correct answer. MAGI is built for the opposite: contested questions where a single confident reply is not what you need, and the record of disagreement is the output.

Architecture

Three LLMs (by default a GPT-5-class model, a Claude model, and a Gemini model — all swappable) query in parallel via asyncio.gather() on top of litellm. Each returns a structured JSON verdict: answer, reasoning, confidence. An aggregator then applies one of eight decision methods. The highest-confidence model — the rapporteur — writes the synthesis; the lowest writes the minority report. Fallback chains handle permanent-error cases (deprecated models, auth failures) without dropping the panel below quorum.

MAGI deliberation panel: three LLM nodes (GPT 5.4 Nano, Claude Haiku 4.5, Gemini 2.5 Flash Lite) arranged in a triangle around a central MAGI hexagon, each node showing a vote (NO / NO / YES) and a confidence score (66% / 72% / 90%). A verdict bar at the foot reads NO, 2 to 1.
A deliberation in progress. Each model returns a structured verdict and confidence score; the aggregator tallies the result. The question shown is itself an Evangelion plot question — the UI handles moral dilemmas as easily as policy ones.

Two rounds, blind peer review

The interesting variant is the second-round deliberation. Each model sees peers’ Round 1 answers under randomised pseudonyms (“Participant X7K2”) — no “Claude said”, no “GPT said”. Brand deference is the thing the pseudonyms remove; the visible shift in Round 2 reasoning depth is the payoff. Real names are restored in the final user-facing report.

The deliberation in motion. Round 1 firing in parallel; arrows converge on the central node as the aggregator runs.

Eight decision methods

  • VoteYesNo, VoteOptions, Majority, Consensus — aggregation by tally.
  • Probability — each model returns a probability; the panel averages.
  • Synthesis, Minority — rapporteur plus recorded dissent.
  • Compose — models generate content, then blind peer-review each other’s drafts.

Different question classes need different aggregators. “Should we do X?” has a different shape from “estimate the probability of Y” or “draft a paragraph on Z”, and the method chosen is part of how the question is defined.

MAGI setup screen: question input field at the top, a row of eight decision-method buttons (Synthesis, Probability, Vote-Yes/No selected, Vote-Options, Majority, Consensus, Minority, Compose), three model-selection cards below for GPT 5.4 Nano, Claude Haiku 4.5 and Gemini 2.5 Flash Lite, a Deliberative toggle set to ON, and an Initiate Deliberation button.
Setup. The decision method is part of the question, not a global setting — chosen before deliberation begins.

Design roots

The procedures are borrowed, not invented. Parliamentary committee debate. The European Court of Human Rights’ juge rapporteur model, where one judge drafts the opinion and dissents are recorded separately. Quaker consensus practice. James Fishkin’s deliberative polling, where participants revise positions after exposure to peer reasoning. The visual cue — a triangle of nodes, a round-over-round stance table — is borrowed from the MAGI supercomputer in Neon Genesis Evangelion, but the cue is cosmetic. The deliberative mechanism is from political-science scholarship, not anime.

A still from the 1997 film The End of Evangelion: a hand holding a small console showing the original MAGI three-node deliberation interface, with nodes labelled BALTHASAR-2, CASPER-3 and MELCHIOR-1, two showing approve and one showing deny in Japanese kanji. Header reads RESULT OF THE DELIBERATION / MOTION : SELF DESTRUCTION.
The visual reference. Neon Genesis Evangelion: The End of Evangelion (Hideaki Anno, 1997). The triangle layout and the round-by-round stance display in MAGI take their cue from this scene. Still © Khara / Project Eva, 1997. Used as an illustrative reference.

Try it

The value is not the verdict. It is the record of the disagreement.