Protectors of the modern world: defending against Shadow ML and Agentic AI

Abstract image of cyber security in action.

OpenVPN-protokollet - därför är det så bra (Image credit: Shutterstock)

It may sound like hyperbole to say that machine learning operations (MLOps) have become the backbone of our digital future, but it’s actually true. Similar to how we view energy grids or transportation systems as part of the critical infrastructure that powers society, AI/ML software and capabilities is quickly becoming essential technology for a wide range of companies, industries, and citizen services.

With artificial intelligence (AI) and machine learning (ML) rapidly transform industries, we’ve also seen the rise of a new age of “Shadow IT” now referred to as “Shadow ML.” Employees are increasingly deploying AI agents and ML models without the knowledge or approval of IT departments, often circumventing security protocols, data governance policies, and compliance frameworks.

This unchecked proliferation of unauthorized AI tools introduces significant risks, from data leakage to model bias and vulnerabilities that threat actors could exploit. CISOs and IT leaders are now tasked with shining a light into the shadows– ensuring that AI-driven decisions are explainable, secure, and aligned with enterprise policies. Understanding the evolving role of MLOps in managing and securing the rapidly expanding AI/ML IT landscape is essential to safeguarding the interconnected systems that define our era.

Eyal Dyment

Vice President of Security Products at JFrog.

Software is critical infrastructure

Software is an omnipresent component of our day-to-day lives, operating quietly but indispensably behind the scenes. For that reason, failures in these systems are often hard to detect, can happen at any moment, and spread quickly across the globe, disrupting businesses, upsetting economies, undermining governments or even endangering lives.

The stakes are even more significant as AI and ML technologies increasingly take center stage when it comes to software development and management. Traditional software operations are giving way to AI-driven systems capable of decision-making, prediction, and automation at unprecedented scale. However, like any technology that ushers in new but immense potential, AI and ML also introduce new complexities and risks, elevating the importance and need for strong MLOps security. As reliance on AI/ML grows, the robustness of MLOps security becomes foundational to fending off evolving cyber threats.

Understanding the risks of the MLOps lifecycle

The lifecycle of building and deploying ML models is filled with both complexity and opportunity. At its core, these processes include:

Selecting an appropriate ML algorithm, such as a support vector machine (SVM) or decision tree.
Feeding a dataset into the algorithm to train the model.
Producing a pre-trained model that can be queried for predictions.
Registering the pre-trained model in a model registry.
Deploying the pre-trained model into production by either embedding it in an app or hosting it on an inference server.

It’s a structured approach but one with significant vulnerabilities that threaten stability and security. These vulnerabilities, broadly categorized as inherent and implementation-related, include:

Inherent Vulnerabilities: The complexity of ML environments, including cloud services and open-source tools, can create security gaps that may be exploited.
Malicious ML models: Pre-trained models can be weaponized or intentionally crafted to produce biased or harmful outputs, causing trickle-down damage across dependent systems.
Malicious datasets: Training data can be poisoned to inject subtle yet dangerous behaviors that undermine a model’s integrity and reliability.
Jupyter “sandbox escapes”: In another example of “Shadow ML,” many data scientists today rely on Jupyter Notebook, which can serve as a path for malicious code execution and unauthorized access when not adequately secured.

Implementation vulnerabilities

Authentication shortcomings: Poor access controls expose MLOps platforms to unauthorized users, enabling data theft or model tampering.
Container escape: Containerized environments with improper configuration allow attackers to break isolation and access the host system and other containers.
MLOps platform immaturity: The rapid pace of innovation in AI/ML often outpaces the development of secure tooling, creating gaps in resilience and reliability.

While AI and ML can offer enormous benefits for organizations, it’s crucial not to prioritize rapid development over security. Doing so could compromise ML models and put organizations at risk. Furthermore, developers must exercise caution when loading models from public repositories, ensuring they validate the source and potential risks associated with the model files. Robust input validation, restricted access, and continuous vulnerability assessments are critical to mitigating risks and ensuring the secure deployment of machine learning solutions.

MLOps hygiene best practices

There are many other vulnerabilities across the MLOps pipeline, underscoring the importance of vigilance among teams. Many separate elements within a model serve as potential attack vectors, which organizations typically manage and secure. Therefore, implementing standard APIs for artifact access and ensuring seamless integration of security tools across various ML platforms for data scientists, machine learning engineers, and core development teams is essential. Key security considerations for MLOps development should include:

Dependencies and packages: Teams often use open-source frameworks and libraries like TensorFlow and PyTorch. Providing access to these dependencies from trusted sources—rather than directly from the internet—and conducting vulnerability scans to block malicious packages ensures the security of each component within the model.
Source code: Models are typically developed in languages such as Python, C++, or R. Employing static application security testing (SAST) to scan source code can identify and alleviate errors that may compromise model security.
Container images: Containers are used to deploy models for training and facilitate their use by other developers or applications. Performing comprehensive scans of container images before deployment helps prevent introducing risks into the operational environment.
Artifact signing: Signing all new service components early in the MLOps lifecycle and treating them as immutable units throughout different stages ensures that the application remains unchanged as it advances toward release.
Promotion/release blocking: Automatically rescanning the application or service at each stage of the MLOps pipeline allows for early detection of issues, which in turn helps with swift resolution and maintaining the integrity of the deployment process.

By adhering to these best practices, organizations can effectively safeguard MLOps pipelines and ensure that security measures enhance rather than impede the development and deployment of ML models. As we move further into an AI-driven future, the resilience of the MLOps infrastructure will become an increasingly key component to maintaining the trust, reliability, and security of the digital systems that power the world.

We've featured the best online cybersecurity course.

This article was produced as part of TechRadarPro's Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro

Eyal Dyment is Vice President of Security Products at JFrog.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.