The Kali Linux team has released a guide for running AI-driven penetration testing entirely on local hardware, eliminating cloud dependencies. This setup uses Ollama, 5ire, and MCP Kali Server to enable natural language commands for security tools. Published on March 10, 2026, the guide addresses privacy concerns in sensitive environments.
The Kali Linux team published a new guide on March 10, 2026, as part of its series on large language model (LLM)-driven security tools. This entry focuses on a fully self-hosted stack that processes all AI operations on local hardware, avoiding third-party cloud services. The approach tackles privacy and operational security issues that have limited cloud-based AI in penetration testing.
The setup requires an NVIDIA GPU with CUDA support. The guide uses an NVIDIA GeForce GTX 1060 with 6 GB of VRAM as reference hardware. It involves installing NVIDIA's proprietary drivers, replacing the open-source Nouveau driver, to enable CUDA acceleration. After installation and reboot, the system confirms Driver Version 550.163.01 and CUDA Version 12.4.
Ollama serves as the core LLM engine, acting as a wrapper for llama.cpp to simplify model management. Installed via a Linux AMD64 tarball and set up as a systemd service, it runs in the background. The guide evaluates three models with tool-calling support: llama3.1:8b (4.9 GB), llama3.2:3b (2.0 GB), and qwen3:4b (2.5 GB), all fitting within the 6 GB VRAM limit.
The Model Context Protocol (MCP) integrates the AI with security tools through the mcp-kali-server package, available in Kali repositories. This creates a local Flask server on 127.0.0.1:5000, verifying tools like nmap, gobuster, dirb, and nikto. It supports tasks such as web application testing, CTF challenges, and interactions with platforms like Hack The Box or TryHackMe.
To connect Ollama and MCP, the guide uses 5ire, an open-source AI assistant and MCP client distributed as a Linux AppImage in version 0.15.3. Installed to /opt/5ire/ and configured with a desktop entry, it enables Ollama as the provider and registers mcp-kali-server for tool access.
Validation involved a natural language prompt in 5ire, using qwen3:4b, to scan scanme.nmap.org on ports 80, 443, 21, and 22. The LLM invoked nmap via MCP, delivering structured results offline, with full GPU processing confirmed.
According to the Kali Linux Team, "the full-stack Ollama, mcp-kali-server, and 5ire are open source, hardware-dependent rather than service-dependent, and tunable based on available VRAM." This configuration offers a privacy-preserving option for red teams and researchers in air-gapped or data-sensitive settings.