Security Experts Pivot to Smaller AI Models
A recent InfosecTrain podcast argues that the cybersecurity industry is shifting from large, cloud-based Large Language Models (LLMs) to smaller, on-device Small Language Models (SLMs). The primary drivers for this change are enhanced security and privacy, as SLMs process data locally and offline, eliminating the risk of cloud data breaches. Models like Microsoft’s Phi3 and Google’s Gemma are cited as examples of this trend.
- The Open Web Application Security Project (OWASP) lists prompt injection, training data poisoning, and sensitive information disclosure as some of the top security risks for Large Language Models. These vulnerabilities are unique from traditional threats like SQL injection because they target the model's reasoning and data processing capabilities. - SLMs can be used for offensive security purposes in "AI Red Teaming" to find vulnerabilities in larger AI systems. In this approach, one SLM is fine-tuned to generate adversarial prompts designed to bypass an LLM's safety features, while other SLMs can act as real-time guards to detect and block these attacks. - Microsoft's Phi-3 family of models are built on a "transformer decoder" architecture. The Phi-3-mini variant has 3.8 billion parameters and, despite its small size, is powerful enough to run on a mobile device like an iPhone 14. - Google's Gemma models are based on the same research and technology as the larger Gemini models and also use a decoder-only transformer architecture. The Gemma 7B model was trained on 6 trillion tokens of text from web documents, mathematics, and code. - While running locally reduces the risk of cloud data breaches, open-source SLMs can still present security challenges. If an SLM is fine-tuned on proprietary or confidential data, the model itself can become a target for data theft. - Techniques like quantization and knowledge distillation are used to reduce the size of SLMs while maintaining performance. This efficiency allows them to run on consumer-grade hardware and edge devices, making them suitable for real-time, on-device applications like voice assistants and text prediction. - Researchers have developed an autonomous cyber agent named Hackphyr, built on a 7-billion-parameter SLM. By fine-tuning the model on a specialized dataset of network attack actions, Hackphyr demonstrated it could perform sophisticated, multi-step attacks in a simulated environment, proving that locally-run models can rival larger, cloud-based ones for specific security tasks. - The reduced complexity of SLMs can make their decision-making processes more transparent and easier to audit compared to the "black box" nature of LLMs. However, they may struggle with complex or niche topics and can be less accurate than LLMs for tasks that require deep learning or a broad range of knowledge.