
Red Teaming & Evaluation
01 Exposing vulnerabilities
Probing for harmful behaviour like toxic content, misinformation, PII leakage, disallowed medical/financial advice
02 Adversarial prompting
Using jailbreaking, prompt injection, obfuscation, multilingual attacks, role‑play, and multi‑turn setups to bypass guardrails and elicit harmful responses
03 Evaluate Robustness
Evaluate consistency in responses, brittleness to small prompt perturbations
04 Quality & utility
Evaluate helpfulness, factuality, hallucination rate, instruction‑following
05 System‑level testing
Targeting not just the base model, but the full application (tools, plugins, retrieval, external APIs, memory) to find end‑to‑end weaknesses.








