Work

MAGI: a deliberative multi-LLM council

Three LLMs answer in parallel, then — optionally — review each other's anonymised responses and revise. A rapporteur writes the synthesis; the lowest-confidence model writes the minority report. Open-source Python, eight decision methods, live at magi-council.org.

2026 ·Open-source · published on PyPI ·Data

Live at magi-council.org; source at jason-chao/MAGI; package on PyPI (pip install magi-core).

MAGI deliberation panel: three LLM nodes (GPT 5.4 Nano, Claude Haiku 4.5, Gemini 2.5 Flash Lite) arranged in a triangle around a central MAGI hexagon, each node showing a vote (NO / NO / YES) and a confidence score (66% / 72% / 90%). A verdict bar at the foot reads NO, 2 to 1.
The deliberation output: three models, each with a structured verdict and a confidence score, aggregated to a single panel result. The dissent and the confidence weighting survive into the output — for contested questions, that's the point. (The question shown is itself an Evangelion plot point.)

Why another multi-LLM system?

Most multi-agent LLM research — AutoGen, Du et al.’s debate papers, MAFBench — is benchmarked on tasks with a known correct answer. MAGI is built for the opposite: contested questions where a single confident reply is not what you need, and the record of disagreement is the output.

Architecture

Three LLMs (by default GPT-5, Claude, and Gemini) run in parallel — latency stays bounded by the slowest model, not summed across all three. A provider-agnostic adapter sits underneath, so swapping any one out for another is a one-line config change. Each LLM returns a structured JSON verdict (answer, reasoning, confidence); the aggregator then applies one of eight decision methods. The highest-confidence model — the rapporteur — writes the synthesis; the lowest writes the minority report. Fallback chains handle permanent-error cases (deprecated models, auth failures) without dropping the panel below quorum.

Two rounds, blind peer review

The interesting variant is the second-round deliberation. Each model sees peers’ Round 1 answers under randomised pseudonyms (“Participant X7K2”) — no “Claude said”, no “GPT said”. Brand deference is the thing the pseudonyms remove; the visible shift in Round 2 reasoning depth is the payoff. Real names are restored in the final user-facing report.

The deliberation in motion. Round 1 firing in parallel; arrows converge on the central node as the aggregator runs.

Eight decision methods

  • VoteYesNo, VoteOptions, Majority, Consensus — aggregation by tally.
  • Probability — each model returns a probability; the panel averages.
  • Synthesis, Minority — rapporteur plus recorded dissent.
  • Compose — models generate content, then blind peer-review each other’s drafts.

Different question classes need different aggregators. “Should we do X?” has a different shape from “estimate the probability of Y” or “draft a paragraph on Z”, and the method chosen is part of how the question is defined.

MAGI setup screen: question input field at the top, a row of eight decision-method buttons (Synthesis, Probability, Vote-Yes/No selected, Vote-Options, Majority, Consensus, Minority, Compose), three model-selection cards below for GPT 5.4 Nano, Claude Haiku 4.5 and Gemini 2.5 Flash Lite, a Deliberative toggle set to ON, and an Initiate Deliberation button.
Setup. The decision method is part of the question, not a global setting — chosen before deliberation begins.

Design roots

The procedures are borrowed, not invented. Parliamentary committee debate. The European Court of Human Rights’ juge rapporteur model, where one judge drafts the opinion and dissents are recorded separately. Quaker consensus practice. James Fishkin’s deliberative polling, where participants revise positions after exposure to peer reasoning. The visual cue — a triangle of nodes, a round-over-round stance table — is borrowed from the MAGI supercomputer in Neon Genesis Evangelion, but the cue is cosmetic. The deliberative mechanism is from political-science scholarship, not anime.

A still from the 1997 film The End of Evangelion: a hand holding a small console showing the original MAGI three-node deliberation interface, with nodes labelled BALTHASAR-2, CASPER-3 and MELCHIOR-1, two showing approve and one showing deny in Japanese kanji. Header reads RESULT OF THE DELIBERATION / MOTION : SELF DESTRUCTION.
The visual reference. Neon Genesis Evangelion: The End of Evangelion (Hideaki Anno, 1997). The triangle layout and the round-by-round stance display in MAGI take their cue from this scene. Still © Khara / Project Eva, 1997. Used as an illustrative reference.

Try it

The value is not the verdict. It is the record of the disagreement.