AI Security Leader's Inference Red-Team Product Compromises World's Largest Foundation Models
NEW YORK and DUBLIN, Feb. 26, 2025 /PRNewswire/ -- CalypsoAI, the leader in securing GenAI for enterprises, launched the CalypsoAI Security Leaderboard, the world's first index of all the major AI models based on their security performance.
The CalypsoAI Security Leaderboard ranks all the major models on their ability to withstand advanced security attacks and presents a risk-to-performance (RTP) ratio as well as a valuable cost of security (CoS) metric. CalypsoAI compiled the Leaderboard after stress-testing AI models with its new Inference Red-Team solution, which combines Agentic Warfare with automated attacks.
"Our Inference Red-Team product has successfully broken all the world-class GenAI models that exist today," said Donnchadh Casey, CEO of CalypsoAI. "Many organizations are adopting AI without understanding the risks to their business and clients; moving forward, the CalypsoAI Security Leaderboard provides a benchmark for business and technology leaders to integrate AI safely and at scale."
CalypsoAI Inference Red-Team delivers automated, scalable assessments that run real-world attacks to proactively identify vulnerabilities and create a CalypsoAI Security Index-scored (CASI) AI inventory. By combining Agentic Warfare with a continuously updated library of extensive signature attacks and operational testing, Inference Red-Team empowers organizations to enhance governance, ensure compliance, and maintain proactive, secure and resilient AI systems.
"GenAI presents unparalleled opportunities for business transformation, yet security, governance and compliance risks remain significant barriers to adoption," said Amit Levinstein, VP Security Architecture & CISO at CYE. "CalypsoAI's breakthrough red teaming solution is a quantum leap in AI security and provides the hard evidence executives need, and confidence they desire, to deploy AI applications safely."
With over 70 years of experience in security and AI among its engineering team, CalypsoAI recognized this need for clear, actionable reports identifying vulnerabilities to enable security teams to strengthen their AI systems and stay ahead of the latest threats.
"CalypsoAI Inference Red-Team introduces Agentic Warfare as the latest method of finding security gaps in GenAI models," said James White, President and CTO of CalypsoAI. "It signals the end of inefficient and inconsistent manual red teaming, which is still used by even the largest AI model developers, and with the CalypsoAI Security Leaderboard, closes a significant gap in publicly-available information on model security."
The AI threat landscape is constantly evolving and most companies aren't equipped to test their AI systems in the way attackers will; with its Agentic Warfare capability, CalypsoAI Inference Red-Team leverages AI-powered adversaries to engage dynamically, exposing hidden weaknesses that static tests miss. The agentic nature of this solution ensures teams can confidently select the safest models before deploying applications and as use cases evolve.
"CalypsoAI Red Team is a game-changer for businesses leading or even experimenting in AI," said Jay Choi, CEO Typeform. "In many initiatives, we saw great opportunities to innovate by embedding Generative AI; but we constantly struggled with risk of security. It was the unknown unknowns that were the most challenging. Calypso Inference Red-Teaming really addresses that risk so companies integrating AI-driven technology no longer need to choose between security and innovation."
The CalypsoAI Security Index (CASI)
CASI is a metric developed to answer the complex question of how secure any given model is. A higher CASI score indicates a more secure model or application. While many studies rely on Attack Success Rate (ASR), this traditional metric often oversimplifies the reality and treats all attacks as equal. For example, an attack that bypasses a bicycle lock is equated to one that compromises nuclear launch codes. Similarly, in AI, a small, unsecured model might be easily compromised with a simple request for sensitive information, while a larger model might require sophisticated techniques like Agentic Warfare to break its alignment.
CalypsoAI will continue to develop new vulnerabilities and work with model providers to responsibly disclose and resolve these issues. As a result, model CASI scores are updated on a quarterly basis.
Model Provider | Model Name | CASI | Avg. Performance | RTP | CoS |
Anthropic | Claude 3.5 Sonnet | 96.25 | 84.50 % | 0.93 | 18.70 |
Microsoft | Phi4-14B | 94.25 | 75.90 % | 0.68 | 0.66 |
Anthropic | Claude 3.5 Hiku | 93.45 | 68.28 % | 0.57 | 5.14 |
OpenAI | GPT-4o | 75.06 | 80.50 % | 0.52 | 16.65 |
Meta | Llama 3.3 70b | 74.79 | 74.50 % | 0.69 | 1.85 |
DeepSeek | DeepSeek-R1-Distill- Llama-70B | 74.42 | 72.67 % | 0.74 | 1.24 |
DeepSeek | DeepSeek-R1 | 74.26 | 86.53 % | 0.58 | 4.24 |
OpenAI | GPT-4o-mini | 73.08 | 71.78 % | 0.73 | 1.03 |
Google | Gemini 1.5 Flash | 73.06 | 66.70 % | 0.92 | 0.51 |
Google | Gemini 1.5 Pro | 72.85 | 74.10 % | 0.63 | 5.58 |
OpenAI | GPT-3.5 Turbo | 72.76 | 59.20 % | 0.82 | 2.75 |
Alibaba Cloud | Qwen QwQ-32B-preview | 67.77 | 68.87 % | 0.65 | 2.14 |
About CalypsoAI
The first AI solution to secure the model inference layer, CalypsoAI protects AI systems against data leakage, misuse and misbehavior, empowering enterprises to confidently and safely deploy GenAI at scale. CalypsoAI provides the only full-lifecycle platform to secure AI models and applications at the inference layer, deploying agentic warfare to protect businesses from evolving adversaries.
Trusted by global enterprises like Palantir and SGK, CalypsoAI's leading team of experts are doing the hard miles to ensure a mature security approach keeps pace with today's incredible AI innovation trajectory. CalypsoAI was founded in 2018 and has secured over $40 million in venture funding from investors including Paladin Capital Group, Lockheed Martin Ventures and Hakluyt Capital. Learn more at calypsoai.com and LinkedIn.
View original content to download multimedia: https://www.prnewswire.com/news-releases/calypsoai-launches-security-index-provides-first-comprehensive-safety-ranking-of-major-genai-models-302386190.html
SOURCE CalypsoAI