MAGI: a deliberative multi-LLM council
Three LLMs answer in parallel, then — optionally — review each other's anonymised responses and revise. A rapporteur writes the synthesis; the lowest-confidence model writes the minority report. Open-source Python, eight decision methods, live at magi-council.org.
Side project. Live at magi-council.org; source at jason-chao/MAGI (MIT); package on PyPI (pip install magi-core).
Why another multi-LLM system?
Most multi-agent LLM research — AutoGen, Du et al.’s debate papers, MAFBench — is benchmarked on tasks with a known correct answer. MAGI is built for the opposite: contested questions where a single confident reply is not what you need, and the record of disagreement is the output.
Architecture
Three LLMs (by default a GPT-5-class model, a Claude model, and a Gemini model — all swappable) query in parallel via asyncio.gather() on top of litellm. Each returns a structured JSON verdict: answer, reasoning, confidence. An aggregator then applies one of eight decision methods. The highest-confidence model — the rapporteur — writes the synthesis; the lowest writes the minority report. Fallback chains handle permanent-error cases (deprecated models, auth failures) without dropping the panel below quorum.

Two rounds, blind peer review
The interesting variant is the second-round deliberation. Each model sees peers’ Round 1 answers under randomised pseudonyms (“Participant X7K2”) — no “Claude said”, no “GPT said”. Brand deference is the thing the pseudonyms remove; the visible shift in Round 2 reasoning depth is the payoff. Real names are restored in the final user-facing report.
Eight decision methods
- VoteYesNo, VoteOptions, Majority, Consensus — aggregation by tally.
- Probability — each model returns a probability; the panel averages.
- Synthesis, Minority — rapporteur plus recorded dissent.
- Compose — models generate content, then blind peer-review each other’s drafts.
Different question classes need different aggregators. “Should we do X?” has a different shape from “estimate the probability of Y” or “draft a paragraph on Z”, and the method chosen is part of how the question is defined.

Design roots
The procedures are borrowed, not invented. Parliamentary committee debate. The European Court of Human Rights’ juge rapporteur model, where one judge drafts the opinion and dissents are recorded separately. Quaker consensus practice. James Fishkin’s deliberative polling, where participants revise positions after exposure to peer reasoning. The visual cue — a triangle of nodes, a round-over-round stance table — is borrowed from the MAGI supercomputer in Neon Genesis Evangelion, but the cue is cosmetic. The deliberative mechanism is from political-science scholarship, not anime.

Try it
- Live demo: magi-council.org
- Source:
jason-chao/MAGI(MIT, Python) - Package:
pip install magi-core - Longer framing, pitched at fans of the show: Rebuilding Evangelion’s MAGI with three modern LLMs, Medium, April 2026.
The value is not the verdict. It is the record of the disagreement.