EU's exposed AI infrastructure

Ollama Servers Through the Loupe

Apr 07, 2026

In September 2025, Cisco Talos conducted a Shodan scan for publicly exposed Ollama instances and found 1,139 worldwide. When I ran the same query in April 2026, I found over 25,000.

brown and white animal head on brown wooden fence during daytime

Some of this difference likely stems from improved Shodan indexing, as the platform’s fingerprinting abilities are getting better. However, improved indexing alone does not explain a 22x increase. Even with the best-case scenario for crawler improvements, the growth rate should worry anyone using inference infrastructure.

About 7,600 of those hosts are located in EU member states, making up just over 30% of global exposure. Germany has 3,550 instances, ranking third worldwide after China and the United States.

The distribution of these instances is not that random and is expected if you’re somewhat familiar with EU datacenter operators. Hetzner, Contabo, OVH, and other EU cloud providers have spent the past year actively promoting CPU and GPU instances to developers wanting to self-host inference. Blog posts, tutorials, community guides, and one-click deployment templates are easy to find, and as the data shows, their marketing is effective. However, the necessary security guidance that should come with this is mostly lacking.

What’s actually running on them

Ollama’s /api/tags endpoint (API reference) shows the complete model inventory of any instance, including names, sizes, quantization levels, and the last loaded timestamp. Anyone can access it with one GET request, without needing credentials.

Among the 254 EU hosts I sampled, the three most popular models are llama3.2:3b, smollm2:135m, and glm-4.7-flash:latest. There are also 17 instances running MXFP4 quantization and 13 using FP8_E4M3. These formats need NVIDIA Blackwell and H100/H200 class hardware, respectively. Datacenter GPUs are publicly exposed and lack authentication.

Using /api/ps, which displays models currently loaded in memory instead of just installed, between 8-12% of the sampled hosts were actively serving inference during my queries. This was true at different times throughout the day and night. The rest were idle but still reachable.

I conducted these tests to see if this is forgotten infrastructure collecting dust in someone’s unused cloud account or if these are machines that someone actively uses and pays for, but apparently never thought to secure.

The part nobody’s talking about

Previous coverage of Ollama exposure has often focused on the risk as a read problem: someone queries your model, generates harmful content that appears on your compute bill, extracts model weights, or misuses your inference capacity. While this is a valid concern, it represents only part of the attack surface.

Ollama’s API is not read-only. The complete unauthenticated endpoint surface includes:

POST /api/pull — pull any model from the registry onto the host
DELETE /api/delete — remove any installed model
POST /api/create — create a model with an arbitrary system prompt
POST /api/generate and POST /api/chat — run inference at the host owner’s expense

I confirmed the write behavior on my own instance:

curl -X DELETE http://localhost:11434/api/delete \
  -d '{"model": "gemma3"}'

200 OK status and the model is removed without asking for credentials.

Ollama’s API reference documents specify DELETE as a supported endpoint and define a success response as 200. The project chose to release a fully writable, unauthenticated API. In contrast, Tenable has a stricter perspective. Their Nessus plugin from March 2026 categorizes “Ollama Unauthenticated Access” as Critical, with a CVSS Base Score of 10.0. They recommend that operators bind to localhost or place the API behind an authenticated reverse proxy. Whether you see this as a design choice or a vulnerability, the result is the same: 25,000 internet-facing instances where one configuration value was changed to expose a writable API with no other alterations.

At this scale, any of these hosts can have their models deleted, allow arbitrary models to be pulled onto their hardware to fill up disk space and VRAM, or enable new models to be created using attacker-controlled system prompts. The threat isn’t just someone accessing your model; it’s someone altering what your model does. You wouldn’t know this unless you happened to check.

If you’re running Ollama

Before anything else, check whether your instance is impacted:

curl http://your-server-ip:11434/api/tags

If it responds, you have work to do.

The way to fix this issue is simple and not new: bind Ollama back to localhost, place it behind Tailscale or a VPN, or use an authenticated reverse proxy if you really need external access. This follows the same security practice that works for any service with a writable API and no authentication. Just because it’s running an LLM doesn’t make it different; it only makes the consequences more creative.

The speed of AI infrastructure deployment has outpaced basic security measures by a significant amount. What’s less certain is whether anyone involved in the deployment process, the cloud providers promoting GPU instances, the Ollama project releasing an unauthenticated API, or the developers setting up instances from a blog tutorial sees this as their issue to address.

Looking forward

The EU’s cybersecurity regulations (NIS2, DORA, GDPR) assume something basic: that you can identify the operator and understand what the service does. This dataset shows that you can’t. There’s no way to know from the outside if a Hetzner instance running Llama 3.2 is a student’s side project or an inference endpoint processing customer data for a regulated entity. NIS2 scoping relies on this. GDPR obligations depend on it. Right now, no one in the enforcement chain can distinguish between 7,600 EU-hosted instances with writable APIs and no authentication.

The immediate accountability lies with the cloud providers. Hetzner, Contabo, OVH, and others are actively marketing GPU instances for self-hosted inference through blog posts, deployment guides, and one-click templates. However, they do not effectively communicate the security risks to the customers they are bringing on board. This gap has a commercial motivation behind it, and it’s the one most likely to close if someone acknowledges it.

I will be rescanning this infrastructure every quarter and update this post. If things continue as they are, the number won’t remain at 25,000 for long.

If you’re running exposed Ollama infrastructure and want to discuss what I found, or if you’re researching this space and want to compare notes: research@insecurestack.com

Methodology: Shodan query product:"Ollama" conducted April 2026. EU host enumeration via direct API calls to a sampled subset of 254 IPs across 9 EU countries, conducted April 2026. Model inventory collected via /api/tags. Active inference state via /api/ps. Write endpoint behaviour verified on author-controlled infrastructure only.

Sources: Cisco Talos, September 2025, same Shodan methodology, 1,139 hosts — Detecting Exposed LLM Servers. Tenable Nessus, March 2026, CVSS 10.0 — Ollama Unauthenticated Access.

insecurestack

Discussion about this post

Ready for more?