Particle: Microsoft Releases Scanner to Detect Backdoored Open-Weight LLMs

Overview

Microsoft’s AI Security team says the tool flags sleeper-agent poisoning using three signatures: a distinctive “double triangle” attention pattern with output collapse, leakage of memorized poisoned data, and activation by partial or fuzzy trigger variants.
The scanner requires no additional training or prior knowledge of a trigger, uses forward passes to stay computationally light, and extracts memorized content to rank likely trigger substrings.
In evaluations on GPT-style models from roughly 270 million to 14 billion parameters, Microsoft reported low false positives and practical scanning at scale.
Current limits include the need for open model files, no coverage for multimodal systems, and strongest performance on deterministic trigger behaviors, so it is not a comprehensive backdoor detector.
Microsoft positions the release as a research artifact rather than a product and is expanding its Secure Development Lifecycle for AI as industry work shows small poisoned datasets can seed backdoors, including Anthropic’s finding that about 250 documents can suffice.