Ollama vulnerability highlights danger of AI frameworks with unrestricted access

0
5

A critical vulnerability in Ollama poses a direct risk of sensitive information leaks to more than 300,000 internet-exposed servers, researchers have found.

The flaw, tracked as CVE-2026-7482, stems from an out-of-bounds heap read in Ollama’s model quantization pipeline. Ollama is one of the most popular frameworks for running AI models on local hardware. The flaw also subjects servers on local LANs to the leak risk if access is not restricted to them.

The vulnerability, dubbed Bleeding Llama by the researchers from Cyera who found it, enables unauthenticated attackers to upload a specially crafted file to the Ollama API endpoint, causing the application to leak its process memory, including system prompts, user messages, environment variables, and other sensitive data.

Ollama provides an interface and REST API server for running and calling locally hosted large language models (LLMs). The application does not provide authentication by default and is also often configured to listen on all network interfaces (0.0.0.0), even though it’s meant for local usage and binds to localhost (127.0.1.1) by default. There are approximately 300,000 Ollama servers currently exposed on the public internet and many more on local networks.

“With over 170,000 GitHub stars and 100 million Docker Hub downloads, Ollama is widely used across enterprises as a self-hosted AI inference engine,” Cyera warns, adding that the vulnerability is broadly exploitable because no authentication is required.

Only three API requests needed for exploit

Located in Ollama’s model quantization pipeline, the bug relates to how the framework loads GGUF (GPT-Generated Unified Format) files, which store weights, metadata, and tokenizer information for local models.

“A malicious actor can craft a GGUF file that declares a far larger tensor size than the actual data provided, forcing Ollama to read well beyond the intended buffer boundary — accessing sensitive data stored on the heap,” the researchers said.

Leaked memory data can include user prompts and chat messages, system prompts from all running models, conversation history across all users, API keys, tokens and secrets stored in environment variables, proprietary code submitted to the AI models, customer data and contracts reviewed via AI models, and so on.

After exploiting the vulnerability, attackers can send a request to Ollama’s push API endpoint to exfiltrate the model and embedded leaked data to a server under their control.

Mitigation

Users should update to Ollama version 0.17.1, which includes a patch for this vulnerability. More generally, they should deploy an authentication proxy or API gateway in front of all Ollama instances and never expose them to the internet without IP access filters and firewalls.

“If your Ollama server was internet-accessible, assume environment variables and secrets in memory may be compromised,” Cyera said. “Rotate API keys, tokens, and credentials immediately.”

On local networks, Ollama servers should also be isolated on secure network segments and behind firewalls. This general security advice pertains to all AI frameworks and AI agent frameworks, which are being increasingly targeted by attackers.

Vulnerability management programs should monitor such tools, and their presence on networks should be regularly audited because employees might deploy such frameworks and tools without their company’s permission and knowledge.

– Read more