Evil Models and Exploits: When AI Becomes the Attacker

Artificial intelligence is redefining industries at a staggering pace, and the field of cybersecurity is no exception. From coding assistants to penetration testing tools, we are witnessing the emergence of AI-driven mechanisms that amplify productivity and problem-solving.

However, the same tools that can enhance development workflows can also empower malicious actors. Here are four ways that AI is reshaping hacking and malware development, and how we can stay vigilant in response.

1. Agent-Augmented Hacking

The concept of agent-augmented hacking, using AI in novel and potentially destructive ways, is quickly moving from hypothetical to inevitable.

Anyone who has used AI-powered coding assistants like GitHub Copilot, Cursor or similar tools is familiar with the input-feedback loop. This process involves AI systems accessing a shell, executing commands, capturing output and using that feedback to refine subsequent instructions. In development, this iterative process enables coding assistants to create more effective and accurate code by directly understanding execution environments.

The same concept can be weaponized. Attackers can integrate the input-feedback loop into their toolkits, using AI to orchestrate popular penetration testing tools like OWASP ZAP, Nmap or Nikto2. The outputs of these tools could be piped directly into an AI model, allowing the system to craft tailored exploit code based on the findings.

Should the initial exploit fail, the system’s feedback loop enables it to iterate — adjusting, refining and retrying until successful. This process drastically reduces the time and expertise required to identify and exploit vulnerabilities, marking a fundamental shift in the landscape of cybersecurity threats.

2. Model Context Protocol

A more structured threat emerges with technologies like the Model Context Protocol (MCP). Originally introduced by Anthropic, MCP allows large language models (LLMs) to interact with host machines via JavaScript APIs. This enables LLMs to perform sophisticated operations by controlling local resources and services.

While MCP is being embraced by developers for legitimate use cases, such as automation and integration, its darker implications are clear. An MCP-enabled system could orchestrate a range of malicious activities with ease. Think of it as an AI-powered operator capable of executing everything from reconnaissance to exploitation.

For now, these capabilities are likely to surface in white-hat research, offering a preview of how attackers might use such tools. But it’s only a matter of time before malicious actors follow suit, introducing a new level of sophistication and autonomy in cyberattacks.

3. Evil Models

The proliferation of AI models is both a blessing and a curse. Platforms like Hugging Face host over a million models, ranging from state-of-the-art neural networks to poorly designed or maliciously altered versions. Amid this abundance lies a growing concern: model provenance.

Imagine a widely used model, fine-tuned by a seemingly reputable maintainer, turning out to be a tool of a state actor. Subtle modifications in the training data set or architecture could embed biases, vulnerabilities or backdoors. These “evil models” could then be distributed as trusted resources, only to be weaponized later.

This risk underscores the need for robust mechanisms to verify the origins and integrity of AI models. Initiatives like Sigstore, which employs tools such as SLSA (Supply chain Levels for Software Artifacts) to verify software provenance, must extend their efforts to encompass AI models and datasets. Without such safeguards, the community remains vulnerable to manipulation at scale.

4. Privacy Risks and PII Regurgitation

AI models are trained on vast amounts of data, much of it scraped from the internet or uploaded by users. This data often includes sensitive personally identifiable information (PII), secrets and tokens. The result? Models inadvertently regurgitate fragments of this sensitive information in their outputs.

Consider a scenario where users turn to AI for therapy or personal guidance. The PII embedded in these interactions, if included in subsequent training cycles, could resurface as part of a model’s output. As adoption grows, so too does the risk of sensitive data exposure.

This issue could spark a much-needed privacy movement, where users demand greater transparency about how their data is used. The age-old adage that “users are the product” may gain new relevance in the AI era, leading to tighter regulations and technological safeguards.

Mitigating the Risks: A Call to Action

As the cybersecurity landscape evolves, developers, enterprises and open source communities must adapt. The threats posed by AI, including enhanced hacking capabilities and privacy violations, are daunting but not insurmountable. Here are three key areas to focus on:

Standardizing model provenance: The open source community must prioritize transparency and verification in the AI supply chain. Tools like Sigstore and SLSA should become standard practice for validating models and their training datasets.
Building defensive AI systems: Just as attackers use AI to amplify their capabilities, defenders must do the same. This includes leveraging AI for real-time threat detection, vulnerability analysis and anomaly detection to stay ahead of evolving threats.
Privacy-first AI practices: Protecting user data should be a cornerstone of AI development. Local agents can offer privacy-based protections for coding assistants and represent a step in the right direction. Broader adoption of privacy-focused technologies will be critical.

Conclusion

AI’s potential to transform cybersecurity is immense, but so are the risks. From agent-augmented hacking to privacy violations, the industry is facing challenges that demand proactive solutions. The need for verifiable AI models, privacy safeguards and AI-enhanced defenses has never been more urgent.

At Stacklok, we’re committed to addressing these challenges. We recently made CodeGate, a local privacy protection system for coding assistants and agents, open source as part of our mission to make AI both secure and trustworthy. The road ahead is uncertain, but with vigilance and collaboration, we can shape a future where AI amplifies security rather than undermines it.

Luke Hinds is the co-founder and CTO of Stacklok, leading the charge to secure AI generative code. He was formerly a Distinguished Engineer at Red Hat, where he led a security engineering team in the office of the CTO. He…

Evil Models and Exploits: When AI Becomes the Attacker

1. Agent-Augmented Hacking

2. Model Context Protocol

3. Evil Models

4. Privacy Risks and PII Regurgitation

Mitigating the Risks: A Call to Action

Conclusion

Disclaimer

Popular

More Like this

Evil Models and Exploits: When AI Becomes the Attacker

1. Agent-Augmented Hacking

2. Model Context Protocol

3. Evil Models

4. Privacy Risks and PII Regurgitation

Mitigating the Risks: A Call to Action

Conclusion

Disclaimer

More like this

Popular

Upcoming Events

Newsletter Signup Form!