Announcing CIRCUIT at FIRSTCON26: An Open Source Framework for AI Interpretability Risk Management

"When this AI system makes a bad decision — how will we know which part of the model made it?" That question, asked in a governance committee meeting, is why CIRCUIT exists. Today's AI controls treat every model as an opaque function from input to output — paperwork around the outside of a black box. CIRCUIT is the first public release: a Score, a Registry, and a Control, released under Apache 2.0, built to make interpretability an enforceable security control.

When the Picture Lies to the Prompt: CrossMPI and the Case for Circuit-Level Governance

A research team at Xidian University has published an attack — CrossMPI — that perturbs an image invisibly to humans and, without changing a single character of the user's text prompt, causes a vision-language model to execute the attacker's task. Average success rate across six production-style models: 66.36%. None of the five defenses the researchers tested fully eliminated it.

How to Adopt It

The playbook: four phases from Foundation to Mature. The 29-question "Show Me Your Circuits" vendor questionnaire. Regulatory crosswalks covering EU AI Act, NIST AI RMF, ISO 42001, SR 11-7, SOC 2, MITRE ATLAS, and more. And the open release details.

What CIRCUIT Is

CIRCUIT is deliberately small. The whole thing fits on a poster. This part walks through what is inside, in the order a practitioner would encounter it: the Interpretability Maturity Score, the Circuit Risk Score formula, the six KPIs, the three model categories, the eight-section registry schema, the ten hard rules, and the dashboard.

You Cannot Defend What You Cannot Inspect

We are deploying AI we can't explain, defending AI we can't inspect, and trusting AI we can't audit. That is not a governance program — that is a liability surface. Existing AI controls answer who owns the model, where it runs, and whether anyone approved it; they do not answer why the model did what it did. This post names the gap, introduces CIRCUIT — a Score, a Registry, and a Control released under Apache 2.0 — and explains why interpretability is now a security control, not a research curiosity.

AI Interpretability as a Security Control

Recent research on weight-sparse transformers offers the first serious path toward AI systems with inspectable internal logic. This is not theoretical — it is a concrete methodology that produces models where we can point to the exact circuit implementing a specific decision. For security leaders navigating the "use AI everywhere" mandate while remaining accountable when things go wrong, this research deserves your attention.

Contribute or follow along on GitHub

github.com/jumpmindinc/circuit-framework →