

Wicus Ross
 Senior Security Researcher
Part 6, this blog post, will be the final blog post in this series. This blog post will explore the unique security challenges that are present in systems that rely on generative artificial intelligence (genAI) as part of its workflow.
The preceding blog posts in this series are located here:
On 17 June 2025 Andrej Karpathy delivered a keynote address at the Y Combinator AI Startup School in San Francisco that claimed that LLMs are basically a new type of operating system (OS)1. Karpathy reasons that this new paradigm is born from an evolution of software.
Originally, programs were written with a programming language or computer code, and this executed natively on a computer. Karpathy refers to this as “Software 1.0”. This evolved with the introduction of neural networks. Here weights are tuned and is “executed” by the neural network to give birth to “Software 2.0”. Karpathy reveals that software has changed again, and programs are written using natural language in the form of prompts. These prompts are interpreted by LLMs that result in actions and outcomes. This new way of creating “programs” or “instructions” means that LLMs can be treated as an OS effectively.
Natural language makes for a very rich and powerful medium through which we can interface with machines when using an LLM. The challenge is that there is a lot of room
 for ambiguity or misinterpretation. Clarifying intent and managing expectations of the user requires clear context for the benefit of the LLM, but humans can be intentionally vague, manipulative, careless, lazy, etc.
Classical programming languages, the “Software 1.0” paradigm, have syntax rules that are enforced by a compiler or a run-time environment. These are, for the most part, well defined and exact. This ensures that, for the most part, a system behaves in a predictable manner. Over the years we have learned that certain defensive programming practices must be employed to guard against error or exploitation.
LLMs, in the “Software 3.0” paradigm, are less strict regarding the “programs” they execute. This can result in any number of outcomes when combined with other loosely defined elements that an LLM can leverage. The non-deterministic and stochastic properties of neural networks, its weights, along with other execution parameters is responsible for this “emergent behavior”.
Emergent behavior, as defined using George Dyson’s definition, reads:
“Emergent behavior is that which cannot be predicted through analysis at any level simpler than that of the system as a whole.Emergent behavior, by definition, is what’s left after everything else has been explained.”2
Dan Geer and Dave Aitel highlight this problematic scenario and the implications this has for cybersecurity frameworks:
“Current cybersecurity frameworks—such as the National Institute of Standards and Technology’s Secure Software Development Framework or Open Worldwide Application Security Project guidelines—assume predictable human coders and defined accountability. These assumptions collapse beneath the weight of AI’s velocity, opacity, and autonomy.”3
LLMs are an important underpinning of the current agentic AI ecosystem. Chaining these agentic AI services together makes for a large and complex system with many possible unknown outcomes. This is mostly due to the fact that LLMs process natural language as “instructions” but also as “data”. Mixing data and instructions in the same “execution pipeline” breathes life into this emergent behavior.
What recourse exists when one of these autonomous systems produces unwanted results? Can consumers expect recompence if something goes wrong and who determines that? Certainly, malicious users will do their best to identify and exploit weaknesses in these systems at the expense of someone else.
It is now possible for anyone using a few words to script an agentic AI system into performing serval tasks on behalf of the user, which previously would have required expert programmers to accomplish. Everyone can now create instructions for machines.
1 https://www.youtube.com/watch?v=LCEmiRjPEtQ
2 George Dyson, Darwin Among the Machines, Addison-Wesley, 1997
3 https://www.lawfaremedia.org/article/ai-and-secure-code-generation
Secure software development practices have evolved a lot since the early days of the internet. The software development community realized quickly that generative AI has potential, but also there are aspects of the technology that require defined best practices and mitigations to limit the impact of malicious users and to guard against unforeseen side-effects.
Community groups such as OWASP and the Cloud Security Alliance (CSA) were among the first to publish guidance on how to secure generative AI powered applications. Here is a non-exhaustive list of prominent projects that address challenges in this space:
The list of resources is growing; some address specific elements of agentic AI systems such as weakness in the model context protocol (MCP) itself.13
Individually these resources provide good guidance based on current capabilities of LLMs and their respective ecosystems. LLMs as a technology are rapidly growing in capability as a lot of effort is invested due to fierce competition among vendors in this space. The authors of these resources will have to keep pace with the emergent threat landscape as well in conjunction with LLM capabilities.
These guidelines and best practices are based on a common understanding of how systems work and the perceived impact. Please evaluate these in terms of your own use case and implementations to find an acceptable balance between risk mitigation and threat identification.
4https://genai.owasp.org/llm-top-10/
5https://genai.owasp.org/resource/securing-agentic-applications-guide-1-0/
6https://genai.owasp.org/resource/multi-agentic-system-threat-modeling-guide-v1-0/
7https://genai.owasp.org/resource/agentic-ai-threats-and-mitigations/
8https://csa-website-production.herokuapp.com/artifacts/agentic-ai-red-teaming-guide
9https://cloudsecurityalliance.org/artifacts/agentic-ai-identity-and-access-management-a-new- approach
10https://cloudsecurityalliance.org/artifacts/secure-agentic-system-design
11https://cloudsecurityalliance.org/blog/2025/02/06/agentic-ai-threat-modeling-framework-maestro
12https://atlas.mitre.org/
13https://adversa.ai/mcp-security-top-25-mcp-vulnerabilities/
A security model, in simple terms, must consider identity and associated privileges, the sensitivity of the data accessed, the execution environment that will perform operations on said data, the sensitivity of the generated output and where this output is written to. This is not a complete model, but it serves as a reminder that there are many aspects involved when designing and implementing information systems.
The resources listed in the previous section share many overlapping ideas or concepts and read together these provide a strong initial reference point. Here is a condensed list of universal security themes from the OWASP AI Multi-Agent Threat Model Guide:
The controls are necessary because of:
The OWASP Agentic AI – Threats and Mitigations define 15 threats, as part of its Agentic Security Initiative (ASI), with appropriate mitigations that are mapped to the seven layers of the CSA’s Multi-Agent Security Threat Risk and Outcome (MAESTRO) framework.
| MAESTRO Layer | OWASP ASI Threat ID | 
|---|---|
| 1. Foundational Model | T1 – Memory poisoning T7 – Misaligned and deceptive behavior | 
| 2. Data Operations | T1 – Memory poisoning T12 – Agent Communication poisoning | 
| 3. Agent Frameworks | T2 – Tool misuse T6 – Intent breaking and goal manipulation T5 – Cascading hallucinations | 
| 4. Deployment infrastructure | T3 – Privilege compromise T4 – Resource overload T13 – Rogue agents T14 – Human attack on multi-agent system | 
| 5. Evaluation & observability | T8 – Repudiation and untraceability T10 – Overwhelming human in the loop | 
| 6. Security & compliance | T3 – Privilege compromise T7 – Misaligned and deceptive behavior | 
| 7. Agent ecosystem | T9 – Identity spoofing T13 – Rogue agents T14 – Human attacks on multi-agent systems T15 – Human trust manipulation | 
The MAESTRO framework is much more flexible and goes beyond the ASI threat mappings. The cross-layer mapping highlights an important aspect of threat modeling with agentic AI systems. Designers and implementers must be aware that threats are fluid, and they might only instantiate over time through a gradual residual build or through contagion by exposure to certain weaknesses. These can also be described as cascading trust failures when one breakage or failure causes a break in the chain of trust. That is why trust relations can never be implicit and every interaction between systems must be explicitly verified in the context. This is especially true because of emergent behaviors.
| MAESTRO Cross-layer Threat | OWASP Threat ID | 
|---|---|
| Inter-agent data leakage cascade and goal misalignment cascades | T1 – Memory poisoning T3 – Privilege compromise T12 – Agent communication poisoning | 
| Privilege compromise and lateral movement | T3 – Privilege compromise T14 – Human attacks on multi-agent system | 
| Planning and reflection exploitation | T6 – Intent breaking and goal manipulation T7 – Misaligned and deceptive behaviors | 
| Supply chain attacks | Compromise component (library or module) at one layer to impact other layers | 
The MAESTRO framework describes a simple step-by-step approach that can be followed to implement it. The devil is in the details as each system has its peculiarities that need to be identified, and nothing is ever simple when it comes to enterprise level applications. Similarly, the list of mitigations is also high-level and depends entirely on each agent’s use case.
Evaluating these resources highlights one thing and that is there is no one way to solve any problem. As with classical “Software 1.0” applications we need to evaluate each environment and deployment based in the use case and define the scope. The emergent behavior property of generative AI makes this rather challenging in terms of existing perimeter-based security models and deterministic assumptions about legacy software.
CSA’s publication titled “Secure Agentic System Design: A Trait-Based Approach” addresses this by complementing existing security standards and approaches rather than replacing these. The approach is justified as it provides:
Improved decision-making through a transparent mental model
The traits are broken down into the following 7 categories :
These measures supplement existing cybersecurity frameworks and help to create a resilient and trustworthy environment by:
In addition, the principal of least privilege must be enforced throughout the system. Decision transparency and traceability, along with explainability will be valuable when performing incident response and system triage. Input validation and sanitization will be as important as ever to ensure system wide protection. Resource monitoring and throttling of agent resource usage will be crucial to identify anomalies as well as identifying service abuse. Likewise, anomaly detection and behavioral monitoring will remain a crucial feature of any modern system.
Trait-based security analysis is an iterative process that allows system designers and implementers to identify requirements and initial scope by asking questions such as:
Next the key agentic traits and their patterns must be identified. Select traits from the list of 7 categories.
Once the traits and patterns are identified, analyze interactions and assess the associated risks. Each pattern and trait will have known risks associated with the respective trait. Interactions that cross boundaries that impact other traits are also highlighted.
Once this task is completed, design and implement mitigations. Each trait pattern has a list of risks and applicable mitigation. Risks must be prioritized based on likelihood and potential impact of the risk with the specific system being analyzed.
Finally, the loop reaches the monitor, evaluate and adapt phase that acts to identify areas to correct or improve. Use this to close the feedback loop and adjust the security posture appropriately. This requires periodic review and analysis of the system.
The trait-based analysis is well suited to working with existing security practices, software development lifecycles, etc. including threat modeling. It is possible to map the MAESTRO framework and trait-based categories to perform a system decomposition of agentic traits in terms of potential threats for a given system. The following is an example taken from the CSA Secure Agentic System Design: A Trait- Based Approach document mentioned in the “Efforts afoot” section:
| Trait category | Relevant MAESTRO layers | 
|---|---|
| Control & Orchestration | Layer 3 (Agent Frameworks), Layer 4 (Deployment) | 
| Interaction & Communication | Layer 7 (Agent Ecosystem), Layer 5 (Evaluation & Observability) | 
| Planning | Layer 1 (Foundation Models), Layer 3 (Agent Frameworks) | 
| Perception & Context | Layer 2 (Data Operations), Layer 5 (Evaluation & Observability) | 
| Learning & Knowledge Sharing | Layer 1 (Foundation Models), Layer 2 (Data Operations) | 
| Trust | Layer 6 (Security & Compliance), Layer 7 (Agent Ecosystem) | 
This mapping can also be useful when using other tools such as MITRE’s ATLAS matrix to identify specific tactics for a given trait category. For example, “Decentralized Control”, that falls under the Control & Orchestration trait, could be linked to MITRE ATLAS “Takeover Attacks” or “Policy Bypassing” tactics.
“Traits and Patterns” were not covered in much detail and could easily require its own blog post to discuss. The “Traits and Patterns” section describes risks that could originate from outside the system but also could be risks that materialize because of design or implementation faults.
Authentication (Authn) and Authorization (Authz) are fundamental security building blocks in cybersecurity. The ability to identify someone or something (Authn) allows us to establish a level of trust and enables us to grant permissions (Authz) to access certain resources. Having an identity is crucial to enforce zero-trust principles.
The OAuth 2.0 specification is the required best practice for multi context protocol (MCP) servers.14 This requirement is fine for simple end user interactions with clients-to- server interfaces. It does not cover use cases where an MCP server must interface with another agent. How does this first MCP server authenticate with the delegated MCP servers? Whose identity does it use? Are the users’ credentials passed onto the downstream servers?
Service accounts or non-human identities (NHI) are assigned to services that require certain permissions. A service’s NHI is either derived from the account that installed the service, or it is derived through explicit means. The CSA Agentic AI Identity & Access Management publication highlights several shortcomings with available options. OAuth 2.0 credentials can be used by NHI to authenticate with APIs, but this lacks behavioral awareness and session integrity. Additionally surrogate secrets or certificates can act as authentication material, but this lacks real-time behavior verification and requires special addons, especially in dynamic environments. Also, role-based access control (RBAC) linked to human accounts is ripe for abuse as this can lead to excessive
 permissions and these roles are for the most part also static.
Existing identity and access management (IAM) must adapt to the unique demands of agentic AI. CSA motivates the need for NHI of agentic AI because of:

The CSA is proposing a new standard and lists the following essential components for an Agent Identifier (Agent ID):
With this there exists Agent ID ownership and control that describes the principles of Self-Sovereign Identity (SSI). Finally, the ID generation, assignment, and lifecycle management process is also present to regulate how the Agent ID is used throughout the process and when it is revoked, renewed, etc.
All of this is required to support the new agentic AI Identity and Access Management framework architecture, which is comprised of:
There exists proof of concept code for these proposals 16 17, but at the time of writing none of this has been adopted as a standard, which suggests that there will be a lot of roll-your-own solutions, or none.
16https://github.com/akramIOT/Agentic-IAM
17https://github.com/kenhuangus/agent-id-sdk
Security without oversight will always be ineffective and clash with the business it serves. It is important for business leaders and security teams to work together to get the most out of a project that drives automation using generative AI. 18 At the same time, it is important to understand the risks and learn from mistakes but also learn from what works.
An agentic AI council or team must be established with each business. This team is responsible for the overall agentic AI strategy and oversight that allows the business to control and measure the impact of agentic AI on the business. Ideally this team must be led by stakeholders that report to the board and staffed with experts from the business to create a cross-functional team.
The agentic AI group must maintain a live register of agents and their use cases, track what data the agents have access to, and what controls are present that govern these use cases.
The agentic AI group is responsible for assessing the security posture of these agentic AI use cases, ideally before these go live.
The team must hold feedback and postmortems sessions to learn from any successes and failures to ensure the outcomes are shared with the rest of the organization, especially the board. These must be converted into benchmarks for future use cases.
The agentic AI team must establish guidelines for guardrails and when human intervention is required. The team must establish guidance for use cases and map these to desired outcomes. Any scenario that impacts customers must be highlighted and reviewed to ensure that timely and appropriate human level intervention is enacted to avoid unwanted outcomes.
Existing strategies and security principles can be leveraged to define policies for:
It is this agentic AI team’s responsibility to ensure that zero-trust principles are baked into agent AI use cases from the outset.
18https://www.paloaltonetworks.com/blog/2025/09/agentic-ai-looming-security-crisis/
The flow of information is vital for economies and the need to process information is growing by the day. At the same time the need for deeper insight and faster synthesis of information is growing equally if not more. LLMs are a breakthrough that satisfy this need.
This results in more complex information flows, requiring richer and more sophisticated controls to ensure legitimate access to information for the purpose it was intended. Agentic identity will be crucial as this allows system designers to enforce concepts such as least privilege access and to hold relevant parties accountable.
Security models must evolve to accommodate a new type of risk that originates from non-deterministic behavior of agentic AI systems. Emergent behavior makes it difficult for system designers and administrators to limit the impact of unforeseen behavior or the manipulation of the system through malicious actors. Trait-based security models work with existing processes to best scope, define, mitigate, and possibly even eliminate potential unwanted events.
Secure by design and secure by default must be top of mind for all businesses now that everybody is effectively a programmer.