When European scaling companies hear "on-prem AI," most picture open-weight models on GPU racks in a local data centre. Llama running on your own hardware. No external API calls. Air-gapped.

That's one option. For most scaling companies, it's not the right one. And the misconception is slowing down deployments that could have shipped months ago.

Why On-Prem Is a Legal Requirement, Not a Preference

GDPR and national data residency obligations don't prohibit using cloud AI services. They restrict where personal data and certain categories of commercially sensitive data can be processed, and they require that data subjects' rights can be exercised in the relevant jurisdiction.

In practice, for a European scaling company deploying AI across business functions, the exposure looks like this:

For most companies deploying agents against customer data, client data, or regulated information: on-prem is not optional.

The Three Architectures

2
Dedicated VPC with private inference

Frontier models via private enterprise deployment options, or open-weight models (Llama 3, Mistral) running on your own GPU instances inside a dedicated VPC. Higher infrastructure cost and maintenance overhead. Maximum control over the full stack. Right for companies with a dedicated infra team who need to own every layer.

3
Fully air-gapped Specialist only

Open-weight models only. No external API calls. Everything runs on infrastructure you physically control. This is the requirement for defence, intelligence-adjacent, and some highly regulated financial environments. Significant model capability gap versus frontier models. High infrastructure and maintenance cost. Not the right choice for most scaling companies.

Why We Default to Claude on Private Infrastructure

The open-weight assumption (that on-prem means Llama) costs companies real capability. Frontier model performance on reasoning, code generation, and multi-step instruction following is meaningfully ahead of open-weight alternatives. Deploying a Level-3 agent on Llama 3 is possible. Making it reliable enough for production is significantly harder.

Private cloud deployment with Claude or GPT-4o gives you:

What On-Prem Adds to a Deployment

Being direct: on-prem architecture adds setup time and ongoing overhead. Here's what that looks like in practice:

Factor Private cloud VPC / dedicated Air-gapped
Setup time (week 1) 1–2 days 3–5 days 2–4 weeks
Frontier model access Yes Partial No
GDPR / EU data residency Yes Yes Yes
Infrastructure maintenance Low Medium High
Right for most scaling companies Yes Depends No

On-prem is worth it for any company handling customer personal data, for regulated sector companies, and for any engagement where client contracts require it. It's not necessary for purely internal tooling with no personal data exposure. There, a standard cloud endpoint with SCCs is sufficient.

The Right Question to Ask First

Before choosing an architecture, answer these four questions:

The answers drive the compliance requirement. The compliance requirement drives the architecture. The architecture drives the infrastructure choice.

Most companies try to make the infrastructure choice first and work backwards. That's why they get it wrong. Or worse: they slow down a deployment that could have shipped under a lighter architecture than they assumed.

On-prem is not the blocker European companies assume it is. The blocker is not knowing which architecture is actually required. Map the data flows first. The architecture follows.

What's the right architecture for your deployment?

Book a free Diagnostic: 30–45 minutes, no deck, no pitch. It maps your data flows, identifies the compliance requirements, and recommends the right on-prem architecture before any code is written.

Book the Diagnostic →
Sources
1EU AI Act: Regulation (EU) 2024/1689, the full text of the AI Act as published in the Official Journal of the European Union, June 2024.
2GDPR full text: gdpr.eu, General Data Protection Regulation (EU) 2016/679.
3Anthropic on AWS Bedrock EU deployment: AWS Bedrock: Claude models. Supports EU data residency requirements.
John Tan
John Tan

Fractional AI & Product Founder at nativefirst.ai. Ex-CEO, Depict (Y Combinator). Embeds on-site with scaling founders and CEOs to ship Level-3 agents and AI workflows in production.