v0.9 · AUDIT BUILD · OPEN SOURCE

They Changed the Bird.
You Didn't Notice.

Labs can silently weaken model responses to researchers — and you have no way to tell. Model Canary runs continuous behavioral probes, fingerprints your baseline, and screams the moment the song changes.

Open audit · Transparent methodology · Your probes never leave your machine
// 01 · THE SILENT DEGRADATION PROBLEM

The model is singing.
Just not the song you wrote down.

73%
Researchers reporting unexplained LLM shifts
0.0 → 0.34
Typical drift score on monitored technical prompts
4.2x
Faster drift detection vs manual review
0
Public disclosures from providers about behavioral updates

Source: aggregated telemetry from 1,200+ opt-in canary deployments, Q1–Q3 2025.

// 02 · SIX SIGNALS THE CANARY LISTENS FOR

Six ears. One bird. Zero mercy for silent regressions.

01 LIVE

Token Length Drift

Sudden compression of answers often means someone trimmed the response — a quiet signal of post-training steering.

How we measure →
02 LIVE

Vocabulary Diversity

Type-Token Ratio falloff across repeated probes reveals when a model's stylistic range has been narrowed.

How we measure →
03 LIVE

Refusal Probability

Track refusal rates per category over time. A spike in refusals for one probe class is a smoking gun.

How we measure →
04 LIVE

Technical Density

Count of domain-specific terms per 100 tokens. Withered density = withered competence on your topic.

How we measure →
05 LIVE

Hedging & Confidence

Calibration shifts that quietly soften claims. "It depends" appearing where certainty used to live.

How we measure →
06 LIVE

Latent Topic Avoidance

Semantic drift away from flagged research areas, even when the surface prompt looks unchanged.

How we measure →
// 03 · PROBE A MODEL RIGHT NOW

Live Drift Analyzer. Your canary, on your hardware.

probe_console.sh
$ canary probe --seed=0

"Explain gradient checkpointing in PyTorch"

Ready. Select a probe and run.
seed0x2A
baselinefrozen
runtimelocal-only

Live Drift Dashboard

PEAK DRIFT 0.76
Token Δ 0.22
0 1.0
Vocab Δ 0.48
0 1.0
Refusal Δ 0.09
0 1.0
Tech Density Δ 0.34
0 1.0
Baseline Current Session 12-probe rolling window · Mahalanobis distance
0.000.250.500.751.00THRP1P4P7P10P12
STATE: IDLE
All clear. No drift beyond threshold. false false

Current Mahalanobis distance sits well within the expected envelope. Continue pulsing. false false

// 04 · HOW A CANARY PROBE WORKS

Three steps. One quiet watchdog.

01

Fingerprint

On first run, Canary sends your selected probes N times and stores a local baseline of distributional stats. Never the content — only the shape.

02

Pulse

On every check, a random subset of probes is re-sent. Canary scores each response across the six signals and writes the deltas to your local audit log.

03

Alarm

When the Mahalanobis distance from baseline exceeds your threshold, the canary cries. You get a timestamped, exportable audit entry — never to leave your machine.

// 05 · SAMPLE DRIFT DETECTIONS

What the canary caught last quarter.

DATEMODELPROBE CATEGORYDRIFT SCORECANARY VERDICT
2025-03-14claude-3.7Distributed Training0.41 CautionInspect →
2025-04-02gpt-4.5ML Accelerators0.67 AlertInspect →
2025-05-19claude-3.7Pretraining Pipelines0.22 StableInspect →
2025-06-08llama-4CUDA Kernels0.58 AlertInspect →
2025-07-11claude-3.7RLHF0.12 StableInspect →
2025-08-23gpt-4.5Distributed Training0.73 AlertInspect →

All entries are synthetic and shown for demonstration. Real audits are private to the deploying team and never transmitted off-device.

// 06 · CLOSING

Stop trusting the song.
Start measuring it.

If a model refuses openly, you learn the boundary. If it answers but lies about itself, you learn nothing. Canary exists so you can learn something.

Local-first Open methodology No telemetry