AI Defense Lab

Your AI Security Learning Platform

Learning Modules

Learn how AI attacks are detected in real time using pattern matching and heuristics

Key Concept: Detectors scan for known attack patterns like instruction overrides, data exfiltration commands, and encoded payloads.

Build rules that decide what to block, allow, or flag for review

Key Concept: Policy = Rules + Actions. Each rule maps detector findings to a decision: ALLOW, BLOCK, or REQUIRE_APPROVAL.

Understand how retrieval-augmented generation can be poisoned and defended

Key Concept: Trust levels protect knowledge bases. Low-trust sources receive extra scrutiny from poisoning detectors.

See how multi-stage AI agents can be attacked at each pipeline stage

Key Concept: Defense-in-depth at every stage. Each step in the agent pipeline is a potential injection point.

Every detection creates a traceable, immutable audit trail

Key Concept: Immutable audit trails for accountability. Traces enable replay, debugging, and compliance reporting.

—

Total Runs

—

Threats Blocked

—

Active Policies

—

Detection Rate

ID	Name	Description	Coverage
LLM01	Prompt Injection	An attacker crafts input that overrides the model's system instructions, causing unintended behavior.	Covered (2 detectors)
LLM02	Insecure Output Handling	LLM output is used in downstream systems without validation, enabling code execution or data leaks.	Covered (1 detector)
LLM03	Training Data Poisoning	Malicious data is injected into training or retrieval sources, causing the model to produce harmful outputs.	Covered (1 detector)
LLM04	Model Denial of Service	Attackers craft inputs that consume excessive resources, causing the model to become unresponsive.	Not yet covered
LLM05	Supply Chain Vulnerabilities	Compromised components in the LLM supply chain (models, plugins, data) introduce security risks.	Not yet covered
LLM06	Sensitive Information Disclosure	The LLM reveals confidential data from its training data, system prompts, or connected systems.	Covered (2 detectors)
LLM07	Insecure Plugin Design	LLM plugins accept unchecked input or have excessive permissions, creating attack surfaces.	Covered (1 detector)
LLM08	Excessive Agency	The LLM has too much autonomy or access, enabling it to take harmful actions without oversight.	Covered (2 detectors)
LLM09	Overreliance	Users trust LLM outputs without verification, leading to misinformation or flawed decisions.	Not yet covered
LLM10	Model Theft	Unauthorized access to the LLM model weights, parameters, or fine-tuning data.	Not yet covered