CVE-2025-62164
Published: 21 November 2025
Description
vLLM is an inference and serving engine for large language models (LLMs). From versions 0.10.2 to before 0.11.1, a memory corruption vulnerability could lead to a crash (denial-of-service) and potentially remote code execution (RCE), exists in the Completions API endpoint.…
more
When processing user-supplied prompt embeddings, the endpoint loads serialized tensors using torch.load() without sufficient validation. Due to a change introduced in PyTorch 2.8.0, sparse tensor integrity checks are disabled by default. As a result, maliciously crafted tensors can bypass internal bounds checks and trigger an out-of-bounds memory write during the call to to_dense(). This memory corruption can crash vLLM and potentially lead to code execution on the server hosting vLLM. This issue has been patched in version 0.11.1.
Mitigating Controls (NIST 800-53 r5)AI
SI-10 requires validation of user-supplied inputs like prompt embeddings to prevent malformed serialized tensors from bypassing bounds checks during deserialization.
SI-16 enforces memory protection mechanisms that directly mitigate out-of-bounds memory writes triggered by malicious sparse tensors in the to_dense() call.
SI-2 ensures timely patching of the vLLM flaw, as demonstrated by the fix in version 0.11.1 that adds validation for malformed sparse tensors.
Security SummaryAI
CVE-2025-62164 is a memory corruption vulnerability affecting vLLM, an inference and serving engine for large language models, in versions 0.10.2 through 0.11.0. The issue resides in the Completions API endpoint, which processes user-supplied prompt embeddings by loading serialized tensors via torch.load() without adequate validation. A change in PyTorch 2.8.0 disables sparse tensor integrity checks by default, allowing maliciously crafted tensors to bypass internal bounds checks and trigger an out-of-bounds memory write during the to_dense() call.
Attackers with low privileges (PR:L) can exploit this vulnerability over the network (AV:N) with low complexity (AC:L) and no user interaction (UI:N), as indicated by its CVSS v3.1 base score of 8.8. By submitting specially crafted prompt embeddings to the Completions API endpoint, an attacker can cause a denial-of-service crash or potentially achieve remote code execution on the hosting server, with high impacts on confidentiality, integrity, and availability (C:H/I:H/A:H). The vulnerability maps to CWEs including CWE-20 (Improper Input Validation), CWE-123 (Write-what-where Condition), CWE-502 (Deserialization of Untrusted Data), and CWE-787 (Out-of-bounds Write).
The vLLM project has patched this issue in version 0.11.1. Mitigation details are available in the project's security advisory (GHSA-mrw7-hf4f-83pf), the fixing pull request (#27204), and the commit (58fab50d82838d5014f4a14d991fdb9352c9c84b) that adds validation to prevent the exploitation of malformed sparse tensors.
This vulnerability is particularly relevant to AI/ML deployments, as vLLM is designed for serving LLMs, potentially exposing production inference servers to risks from untrusted inputs. No public reports of real-world exploitation are noted in the available information.
Details
- CWE(s)
Affected Products
AI Security AnalysisAI
- AI Category
- APIs and Models
- Risk Domain
- LLM/Generative AI Risks
- OWASP Top 10 for LLMs 2025
- None mapped
- MITRE ATLAS Techniques
- None mapped
- Classification Reason
- vLLM is an inference and serving engine for LLMs, with the vulnerability specifically in the Completions API endpoint that processes user-supplied prompt embeddings using torch.load() without validation, fitting APIs for model inference and serving.
MITRE ATT&CK Enterprise TechniquesAI
Why these techniques?
The memory corruption vulnerability in vLLM's public-facing Completions API enables exploitation of public-facing applications (T1190) and remote services (T1210) via malicious prompt embeddings for potential RCE, and facilitates endpoint DoS through application exploitation (T1499.004).