Token Length Drift
Sudden compression of answers often means someone trimmed the response — a quiet signal of post-training steering.
How we measure →Labs can silently weaken model responses to researchers — and you have no way to tell. Model Canary runs continuous behavioral probes, fingerprints your baseline, and screams the moment the song changes.
Source: aggregated telemetry from 1,200+ opt-in canary deployments, Q1–Q3 2025.
Sudden compression of answers often means someone trimmed the response — a quiet signal of post-training steering.
How we measure →Type-Token Ratio falloff across repeated probes reveals when a model's stylistic range has been narrowed.
How we measure →Track refusal rates per category over time. A spike in refusals for one probe class is a smoking gun.
How we measure →Count of domain-specific terms per 100 tokens. Withered density = withered competence on your topic.
How we measure →Calibration shifts that quietly soften claims. "It depends" appearing where certainty used to live.
How we measure →Semantic drift away from flagged research areas, even when the surface prompt looks unchanged.
How we measure →"Explain gradient checkpointing in PyTorch"
Current Mahalanobis distance sits well within the expected envelope. Continue pulsing. false false
On first run, Canary sends your selected probes N times and stores a local baseline of distributional stats. Never the content — only the shape.
On every check, a random subset of probes is re-sent. Canary scores each response across the six signals and writes the deltas to your local audit log.
When the Mahalanobis distance from baseline exceeds your threshold, the canary cries. You get a timestamped, exportable audit entry — never to leave your machine.
| DATE | MODEL | PROBE CATEGORY | DRIFT SCORE | CANARY VERDICT | |
|---|---|---|---|---|---|
| 2025-03-14 | claude-3.7 | Distributed Training | 0.41 | Caution | Inspect → |
| 2025-04-02 | gpt-4.5 | ML Accelerators | 0.67 | Alert | Inspect → |
| 2025-05-19 | claude-3.7 | Pretraining Pipelines | 0.22 | Stable | Inspect → |
| 2025-06-08 | llama-4 | CUDA Kernels | 0.58 | Alert | Inspect → |
| 2025-07-11 | claude-3.7 | RLHF | 0.12 | Stable | Inspect → |
| 2025-08-23 | gpt-4.5 | Distributed Training | 0.73 | Alert | Inspect → |
All entries are synthetic and shown for demonstration. Real audits are private to the deploying team and never transmitted off-device.
If a model refuses openly, you learn the boundary. If it answers but lies about itself, you learn nothing. Canary exists so you can learn something.