CVE-2025-24357

High

Published: 27 January 2025

Published

27 January 2025

Modified

27 June 2025

KEV Added

—

Patch

—

CVSS Score 7.5 CVSS:3.1/AV:N/AC:H/PR:N/UI:R/S:U/C:H/I:H/A:H

EPSS Score 0.0101 77.2th percentile

Risk Priority 16 60% EPSS · 20% KEV · 20% CVSS

Description

Adversaries may manipulate application software prior to receipt by a final consumer for the purpose of data or system compromise.

Security Summary

CVE-2025-24357 is a deserialization vulnerability (CWE-502) in the vLLM library, which provides efficient inference and serving for large language models. The issue lies in the hf_model_weights_iterator function within vllm/model_executor/weight_utils.py. This function downloads model checkpoints from Hugging Face and loads them using PyTorch's torch.load with the weights_only parameter defaulting to False, enabling arbitrary code execution during unpickling of malicious pickle data.

Attackers can exploit this vulnerability by publishing a malicious model checkpoint to Hugging Face. Any user running affected versions of vLLM who loads the checkpoint will trigger remote code execution on their system, as the CVSS v3.1 base score of 7.5 (AV:N/AC:H/PR:N/UI:R/S:U/C:H/I:H/A:H) indicates network accessibility, no privileges required, user interaction via model loading, high attack complexity, and high impacts on confidentiality, integrity, and availability.

The vulnerability was fixed in vLLM version 0.7.0. Mitigation involves upgrading to v0.7.0 or later. Relevant resources include the GitHub security advisory at GHSA-rh4j-5rhw-hr54, the fixing pull request #12366, and commit d3d6bb13fb62da3234addf6574922a4ec0513d04; PyTorch documentation for torch.load also notes risks associated with weights_only=False.

This vulnerability highlights risks in AI/ML pipelines, as vLLM is widely used for LLM serving, emphasizing the need to validate model sources and enforce secure deserialization practices.

Details

CWE(s): CWE-502

Affected Products

vllm

≤ 0.7.0

AI Security Analysis

AI Category: NLP and Transformers
Risk Domain: Supply Chain and Deployment
OWASP Top 10 for LLMs 2025: None mapped
MITRE ATLAS Techniques: None mapped
Classification Reason: vLLM is a library specifically for LLM inference and serving, where LLMs are based on transformer architectures, making 'NLP and Transformers' the most fitting category.

MITRE ATT&CK Enterprise Techniques

T1203 Exploitation for Client Execution Execution

Adversaries may exploit software vulnerabilities in client applications to execute code.

attack.mitre.org →

T1195.002 Compromise Software Supply Chain Initial Access

Adversaries may manipulate application software prior to receipt by a final consumer for the purpose of data or system compromise.

attack.mitre.org →

Why these techniques?

Deserialization vulnerability via torch.load(pickle) with weights_only=False enables arbitrary code execution from malicious Hugging Face model checkpoints, facilitating exploitation for client execution and supply chain compromise through tainted software dependencies.

References

https://github.com/vllm-project/vllm/commit/d3d6bb13fb62da3234addf6574922a4ec0513d04
Patch · security-advisories@github.com
https://github.com/vllm-project/vllm/pull/12366
Issue Tracking, Patch · security-advisories@github.com
https://github.com/vllm-project/vllm/security/advisories/GHSA-rh4j-5rhw-hr54
Vendor Advisory · security-advisories@github.com
https://pytorch.org/docs/stable/generated/torch.load.html
Technical Description · security-advisories@github.com