← All White Papers
White Paper · Vol. 02

Agents Everywhere: How to Protect Your Data Sovereignty in the Age of Autonomous AI

AI agents are proliferating faster than the governance frameworks designed to contain them. This paper examines the data sovereignty risks they create, how leading organizations are responding, and what a defensible control architecture looks like in a threat landscape that changes weekly.

Fortify North Research · 2025·32 pages

Executive Summary

Gartner projects that by 2028, 33% of enterprise software applications will include agentic AI capabilities — up from less than 1% in 2024. Organizations are deploying AI agents to automate workflows, manage customer interactions, process documents, write and execute code, and coordinate cross-system tasks at a speed that has decisively outpaced the development of appropriate governance controls.

The result is a data sovereignty crisis in slow motion. Agents access sensitive data, transmit it across API boundaries, cache it in memory systems with inadequate access controls, and route it through third-party AI providers whose data retention and processing policies are frequently incompatible with GDPR, PIPEDA, Canada's Bill C-27, and sector-specific regulations. The Samsung ChatGPT incident of 2023 — in which engineers inadvertently submitted confidential semiconductor source code and meeting notes to an external LLM — was not an anomaly. It was a preview.

This paper presents the threat model for agentic AI data sovereignty, examines how leading organizations and platform vendors are building governance layers, and proposes a practical control architecture organizations can implement today — before the regulatory environment fully crystallizes around them.

33%
Enterprise apps with agentic AI by 2028 (Gartner, 2024)
$4.88M
Average cost of a data breach in 2024 (IBM)
68%
Breaches involving a human or third-party element (Verizon DBIR, 2024)
55%
Organizations with no formal AI governance policy (McKinsey, 2024)
01

The Proliferation Problem

The speed of AI agent adoption inside enterprises has no historical parallel in enterprise technology. In prior technology waves — cloud migration, mobile, SaaS — the infrastructure preceded the workloads, giving security teams time to develop controls before sensitive data moved into new environments. The agentic AI wave has inverted this sequence entirely.

According to McKinsey's 2024 State of AI report, 65% of organizations are now using generative AI in at least one business function — nearly double the figure from the prior year. A significant and growing proportion of that usage is shifting from direct human interaction with AI tools toward automated, agentic workflows: systems where the AI takes sequences of actions autonomously, often with access to real systems, real data, and real external APIs.

Microsoft's 2024 Work Trend Index found that employees at organizations using AI tools spend 30% less time on routine information tasks — a productivity gain driven substantially by agents handling document processing, email triage, meeting summarization, and workflow coordination. Each of these use cases involves an AI system reading, processing, and in many cases transmitting sensitive organizational data.

Shadow AI Proliferation

Employees are deploying AI agents without IT or security awareness. A 2024 Salesforce survey found that 55% of workers use AI tools not approved by their employer. Unlike shadow SaaS — where data is stored on an external platform — shadow AI involves an external system actively reasoning over and generating outputs from sensitive corporate data in real time.

Third-Party API Data Exposure

Every call to an external AI API (OpenAI, Anthropic, Google, Cohere) transmits the contents of that prompt to a third-party system. The data residency, retention, training use, and logging policies of these providers vary significantly and frequently conflict with enterprise data governance requirements. OpenAI's enterprise API does not use submitted data for training, but default consumer endpoints historically have — a distinction most end users do not understand.

Agent Memory and Context Persistence

Agentic systems increasingly use long-term memory — vector databases, key-value stores, or structured logs — to maintain context across sessions. These memory stores accumulate sensitive information over time and frequently lack the row-level access controls applied to conventional enterprise databases. An agent that has processed HR documents, financial reports, and legal correspondence over months creates a data aggregation risk that did not exist before the first conversation.

02

The Data Sovereignty Threat Model

Data sovereignty — the principle that data is subject to the laws and governance frameworks of the jurisdiction in which it is collected and processed — is under structural pressure from agentic AI in ways that conventional cloud security controls do not address. The threat model has four distinct layers:

1

Cross-Border Data Flows via API

When an enterprise deploys an AI agent that calls an external LLM API, it is transmitting data to infrastructure operated in jurisdictions that may be legally incompatible with the data's origin. A Canadian healthcare organization whose AI agent submits patient-adjacent information to US-based API endpoints may be in violation of PIPEDA's accountability principle and provincial health privacy legislation, regardless of the API provider's contractual commitments. The EU AI Act's extraterritorial provisions extend this complexity further: organizations outside the EU that deploy AI affecting EU residents face compliance obligations under a regulatory framework still finalizing its enforcement posture.

2

Inadvertent Data Aggregation

Individual agent interactions may each be defensible in isolation while collectively producing a data aggregation risk. An agent that reads customer service tickets, correlates them with CRM records, and synthesizes summaries may never transmit a complete customer record in any single call — but the synthesis it produces may constitute a derived personal data record under GDPR that requires its own data protection assessment. Traditional DLP tools, designed to detect structured data patterns in outbound traffic, do not detect this category of risk.

3

Agentic Action Without Audit Trails

Conventional enterprise systems generate audit logs as a byproduct of their architecture. Agentic AI systems frequently do not. An agent that reads a file, extracts information, calls an API, and writes output to a new location may produce no persistent record of these actions accessible to security or compliance teams. In the event of a regulatory inquiry or data subject access request, organizations may be unable to reconstruct what data an agent accessed, what it did with that data, or where outputs went. The EU AI Act's Article 12 record-keeping requirements for high-risk AI systems will make this gap a direct compliance liability.

4

Multi-Agent Trust Chain Collapse

Enterprise AI architectures increasingly involve chains of agents: an orchestrator agent that directs specialized sub-agents, which in turn call tools and return results. Data sovereignty obligations that apply to the orchestrator may not flow through to sub-agents — particularly if those sub-agents are operated by different vendors or deployed in different jurisdictions. A data classification that constrains what the orchestrator can transmit may not be visible to a sub-agent making its own API calls. Trust and data governance must be explicitly propagated through every hop of the chain.

03

Case Study: The Samsung Disclosure and What It Signals

In April 2023, Samsung Electronics disclosed that engineers in its semiconductor division had submitted confidential data to ChatGPT on at least three occasions within a 20-day period following the internal rollout of AI tool access. The incidents included: semiconductor equipment measurement data submitted for debugging assistance; NAND flash yield data submitted for optimization analysis; and internal meeting notes submitted for summarization.

"Because ChatGPT retains data entered by users for training purposes, sensitive information such as the source code was transmitted to an external server."

— Samsung internal memo to employees, April 2023

Samsung's response — a temporary ban on generative AI tools across corporate devices and the initiation of an internal AI system development program — illustrates the binary choice organizations faced in 2023: block the technology entirely, or accept uncontrolled data exposure. Neither option is sustainable at scale.

The Samsung case involved direct human-to-AI interactions, not autonomous agents. The risk is structurally larger in agentic deployments, where no human reviews each data transmission before it occurs. An agent tasked with analyzing business performance may autonomously retrieve board-level financial projections, synthesize them against market data from an external source, and transmit the combined context to an LLM API — all without a human in the loop at any step.

Other documented incidents include: employees submitting customer PII to AI tools in violation of GDPR obligations; legal teams drafting court filings using AI tools that hallucinated case citations (a data quality, not sovereignty, failure — but symptomatic of the same governance gap); and healthcare organizations using AI meeting transcription tools whose data residency was incompatible with HIPAA BAA requirements.

04

How Leading Organizations Are Responding

The most sophisticated enterprise AI deployments in 2024–2025 share a common architectural principle: treat the AI agent as an untrusted third party that must earn access to data rather than a trusted system that inherits it. This inversion of the default trust posture is the foundation of defensible agentic AI governance.

Microsoft: Purview AI Hub and Sensitivity Labels

Microsoft has integrated its data governance product, Purview, with Copilot and the broader Azure AI ecosystem. Purview AI Hub provides visibility into what data Microsoft 365 Copilot is accessing across SharePoint, Teams, Exchange, and OneDrive — surfacing over-permissioned sites, unlabeled sensitive files, and anomalous access patterns. Microsoft's sensitivity label architecture allows organizations to tag documents with classification tiers that propagate into AI context windows: a document classified as 'Highly Confidential' can be configured so that Copilot declines to include its contents in outputs accessible to users without that clearance level. This is the most mature implementation of data-aware AI governance currently available in a commercial enterprise AI product.

Salesforce: Einstein Trust Layer

Salesforce built a dedicated trust architecture layer between its AI features and external LLM providers. The Trust Layer: intercepts all LLM API calls from Salesforce Einstein features; applies dynamic data masking to strip PII before transmission and re-inject it on return; enforces a zero-retention policy with supported LLM providers (data is not used for training); generates audit logs of all AI-data interactions; and applies toxicity filtering to outputs. The Trust Layer is contractually enforced through Salesforce's agreements with its LLM partners. This model — an intermediary layer that enforces governance policies at the API boundary — is increasingly the reference architecture for enterprise AI data governance.

AWS Bedrock: Data Isolation by Design

Amazon Bedrock, AWS's managed LLM service, is architected so that customer data does not leave the customer's AWS account boundary. Model invocations are processed within the customer's VPC; inputs and outputs are encrypted with customer-managed KMS keys; AWS commits contractually that Bedrock does not use customer data to train foundation models. For organizations with strict data residency requirements, Bedrock's regional endpoint architecture allows all AI processing to be constrained to a specific geography. This infrastructure-native approach to data sovereignty is particularly relevant for regulated industries and public sector organizations.

Anthropic: Claude's Constitutional AI and Enterprise Controls

Anthropic's enterprise offering includes contractual zero-retention commitments, system prompt confidentiality, and SOC 2 Type II certification. The Claude API's structured system prompt architecture allows organizations to enforce data handling policies at the model interaction level — including explicit instructions about what categories of data the model should decline to include in outputs or transmit to external tools. Anthropic's Model Context Protocol (MCP), while still maturing from a security perspective, includes scope-limiting capabilities that allow administrators to constrain which data sources an agent can access.

05

The Regulatory Landscape Is Catching Up — Fast

The regulatory environment for AI data governance has moved from principles to obligations at a pace that has surprised even well-resourced compliance teams. Organizations that have deferred governance investments pending regulatory clarity should understand that the clarity has largely arrived.

EU AI Act (August 2024 — phased through 2026)

The Act's provisions most relevant to data sovereignty include: Article 10 (data governance requirements for high-risk AI training data); Article 12 (logging and record-keeping for high-risk AI); Article 13 (transparency obligations); and the GPAI model provisions requiring documentation of training data sources and copyright compliance. Canadian organizations with EU market exposure face direct compliance obligations regardless of headquarters location.

Canada's Bill C-27 / AIDA (Artificial Intelligence and Data Act)

Canada's proposed AI legislation, part of Bill C-27, would create binding obligations for high-impact AI systems including impact assessments, mitigation measures, incident monitoring, and plain-language disclosure. While C-27 remains in legislative process as of 2025, the regulatory direction is unambiguous — organizations that begin building governance frameworks now will face significantly lower compliance costs when the legislation is finalized.

PIPEDA and Provincial Privacy Laws

Canada's existing federal private sector privacy law already has direct implications for AI agent deployments. PIPEDA's accountability principle requires organizations to be responsible for personal information under their control — including information processed by AI agents acting on the organization's behalf. Contractual data processing agreements with AI vendors do not transfer this accountability. Quebec's Law 25 (Bill 64), already in force, introduces stricter consent, transparency, and data minimization requirements that directly apply to automated decision-making systems.

NIST AI RMF and Sector-Specific Guidance

The US NIST AI Risk Management Framework's GOVERN function explicitly addresses organizational accountability for AI systems, including agentic deployments. CISA's 2024 guidelines on AI in critical infrastructure and FFIEC's emerging guidance for financial institutions are establishing sector-specific controls that will increasingly be referenced in regulatory examinations.

06

A Defensible Control Architecture

Effective AI agent data sovereignty requires controls at five distinct layers. No single layer is sufficient — a failure at any layer can expose data that all other layers were designed to protect.

L1
Data Classification and Labelling
Every data asset that an AI agent may access must carry a machine-readable classification tag. This is not a new concept — Microsoft Purview, Varonis, and similar tools have applied it to conventional file systems for years. The extension required for AI is ensuring that classification metadata is propagated into agent context windows and that agents are instructed, via system prompts and guardrails, to handle data according to its classification tier.
L2
Principle of Least Privilege for Agent Data Access
Agents should be granted access only to the specific data required for their defined function. An agent that summarizes customer support tickets should not have read access to HR systems — even if both are accessible via the same identity credential. This requires a deliberate agent identity and permission scoping exercise, treating each agent as a non-human identity subject to the same access control reviews applied to service accounts.
L3
API Gateway and Data Loss Prevention
All LLM API calls from enterprise agents should pass through an intermediary layer that: applies DLP scanning to outbound prompts; enforces data residency by routing to compliant endpoints; logs all interactions for audit purposes; and enforces zero-retention commitments with downstream providers. The Salesforce Trust Layer is a commercial implementation of this model. Organizations without a commercial platform can approximate it using API gateways (Azure API Management, AWS API Gateway) with custom policy enforcement.
L4
Human-in-the-Loop for High-Risk Actions
Not all agent actions carry equal data sovereignty risk. Reading a document is different from transmitting its contents to an external API. Sending an email is different from drafting one. A risk-tiered human-in-the-loop framework identifies which categories of agent action require explicit human approval before execution — and ensures those approval gates are implemented in the agent's tool architecture rather than relying on the agent's own judgment.
L5
Audit Logging and Incident Response
Every agent interaction that touches sensitive data must generate a tamper-evident audit record: what data was accessed, what was transmitted, to which endpoints, at what time, under which user context. These records are required for regulatory compliance (EU AI Act Article 12, PIPEDA accountability), data subject access requests (GDPR Article 15), and incident investigation. Agent-native logging frameworks are immature; most organizations will need to implement this at the infrastructure layer rather than relying on application-level logging from the agent framework itself.
07

What Organizations Must Do — Now

The threat landscape for AI agent data sovereignty is not static. New agent frameworks, new tool integrations, and new regulatory requirements are emerging on timelines measured in weeks rather than years. The following actions are not a roadmap to eventual compliance — they are the baseline required to remain defensible today.

1Conduct an AI agent inventory: identify every agent deployed in your environment, sanctioned or otherwise, and map what data each can access.
2Audit data processing agreements with all AI vendors: verify data residency, retention, training use, and sub-processor disclosure commitments.
3Classify the data your agents touch: apply machine-readable sensitivity labels to all data assets accessible by AI systems.
4Implement agent identity controls: create dedicated service identities for each agent with scoped permissions — not shared credentials or user-level access tokens.
5Establish a Shadow AI detection program: use network DLP, DNS monitoring, and endpoint telemetry to identify unsanctioned AI tool usage before it creates an incident.
6Build an AI-specific incident response playbook: document how your organization will detect, contain, and notify in the event of an AI-driven data exposure.
7Engage legal and compliance teams on C-27 and EU AI Act readiness: the regulatory direction is clear even where the final text is not.
08

Conclusion

AI agent proliferation is not a future risk to be prepared for — it is a present condition to be managed. The governance frameworks, regulatory obligations, and technical controls needed to maintain data sovereignty in an agentic environment exist and are increasingly mature. What is rare is the organizational will to implement them before an incident creates the mandate.

The organizations that will navigate this landscape most effectively are those that treat AI governance not as a compliance exercise but as a competitive capability: the ability to deploy AI faster than competitors because their governance architecture lets them do so with confidence. A robust AI data sovereignty framework is not a brake on AI adoption — it is the foundation that makes aggressive AI adoption defensible.

The alternative — deploying agents without governance and hoping the exposure is not discovered — has a poor historical track record. The Samsung incident was discovered internally. Most are not discovered until a regulator, an adversary, or a journalist finds them first.

References

  • [1]Gartner. (2024). Predicts 2025: AI Agents Drive the Next Phase of Enterprise AI Adoption. Gartner Research.
  • [2]McKinsey & Company. (2024). The State of AI in Early 2024: Gen AI Adoption Spikes and Starts to Generate Value. McKinsey Global Institute.
  • [3]Microsoft. (2024). 2024 Work Trend Index: AI at Work Is Here. Now Comes the Hard Part. Microsoft Corporation.
  • [4]IBM Security. (2024). Cost of a Data Breach Report 2024. IBM Corporation.
  • [5]Verizon. (2024). Data Breach Investigations Report 2024. Verizon Communications.
  • [6]Salesforce. (2024). Einstein Trust Layer: Architecture and Security. Salesforce Technical Documentation.
  • [7]European Parliament. (2024). Regulation (EU) 2024/1689: Artificial Intelligence Act. Official Journal of the European Union.
  • [8]Government of Canada. (2022). Bill C-27: Digital Charter Implementation Act, 2022. Parliament of Canada.
  • [9]National Institute of Standards and Technology. (2023). AI Risk Management Framework (AI RMF 1.0). NIST.
  • [10]Office of the Privacy Commissioner of Canada. (2023). Guidance on Artificial Intelligence and Privacy. OPC.

Need an AI agent governance assessment?

Fortify North can audit your agentic AI deployment, map your data sovereignty exposure, and build the governance architecture your regulatory environment requires.