Cyber Posture

CVE-2025-62164

High

Published: 21 November 2025

Published
21 November 2025
Modified
04 December 2025
KEV Added
Patch
CVSS Score 8.8 CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H
EPSS Score 0.0019 40.6th percentile
Risk Priority 18 60% EPSS · 20% KEV · 20% CVSS

Description

vLLM is an inference and serving engine for large language models (LLMs). From versions 0.10.2 to before 0.11.1, a memory corruption vulnerability could lead to a crash (denial-of-service) and potentially remote code execution (RCE), exists in the Completions API endpoint.…

more

When processing user-supplied prompt embeddings, the endpoint loads serialized tensors using torch.load() without sufficient validation. Due to a change introduced in PyTorch 2.8.0, sparse tensor integrity checks are disabled by default. As a result, maliciously crafted tensors can bypass internal bounds checks and trigger an out-of-bounds memory write during the call to to_dense(). This memory corruption can crash vLLM and potentially lead to code execution on the server hosting vLLM. This issue has been patched in version 0.11.1.

Mitigating Controls (NIST 800-53 r5)AI

prevent

SI-10 requires validation of user-supplied inputs like prompt embeddings to prevent malformed serialized tensors from bypassing bounds checks during deserialization.

prevent

SI-16 enforces memory protection mechanisms that directly mitigate out-of-bounds memory writes triggered by malicious sparse tensors in the to_dense() call.

prevent

SI-2 ensures timely patching of the vLLM flaw, as demonstrated by the fix in version 0.11.1 that adds validation for malformed sparse tensors.

Security SummaryAI

CVE-2025-62164 is a memory corruption vulnerability affecting vLLM, an inference and serving engine for large language models, in versions 0.10.2 through 0.11.0. The issue resides in the Completions API endpoint, which processes user-supplied prompt embeddings by loading serialized tensors via torch.load() without adequate validation. A change in PyTorch 2.8.0 disables sparse tensor integrity checks by default, allowing maliciously crafted tensors to bypass internal bounds checks and trigger an out-of-bounds memory write during the to_dense() call.

Attackers with low privileges (PR:L) can exploit this vulnerability over the network (AV:N) with low complexity (AC:L) and no user interaction (UI:N), as indicated by its CVSS v3.1 base score of 8.8. By submitting specially crafted prompt embeddings to the Completions API endpoint, an attacker can cause a denial-of-service crash or potentially achieve remote code execution on the hosting server, with high impacts on confidentiality, integrity, and availability (C:H/I:H/A:H). The vulnerability maps to CWEs including CWE-20 (Improper Input Validation), CWE-123 (Write-what-where Condition), CWE-502 (Deserialization of Untrusted Data), and CWE-787 (Out-of-bounds Write).

The vLLM project has patched this issue in version 0.11.1. Mitigation details are available in the project's security advisory (GHSA-mrw7-hf4f-83pf), the fixing pull request (#27204), and the commit (58fab50d82838d5014f4a14d991fdb9352c9c84b) that adds validation to prevent the exploitation of malformed sparse tensors.

This vulnerability is particularly relevant to AI/ML deployments, as vLLM is designed for serving LLMs, potentially exposing production inference servers to risks from untrusted inputs. No public reports of real-world exploitation are noted in the available information.

Details

CWE(s)

Affected Products

vllm
vllm
0.11.1 · 0.10.2 — 0.11.1

AI Security AnalysisAI

AI Category
APIs and Models
Risk Domain
LLM/Generative AI Risks
OWASP Top 10 for LLMs 2025
None mapped
MITRE ATLAS Techniques
None mapped
Classification Reason
vLLM is an inference and serving engine for LLMs, with the vulnerability specifically in the Completions API endpoint that processes user-supplied prompt embeddings using torch.load() without validation, fitting APIs for model inference and serving.

MITRE ATT&CK Enterprise TechniquesAI

T1190 Exploit Public-Facing Application Initial Access
Adversaries may attempt to exploit a weakness in an Internet-facing host or system to initially access a network.
T1210 Exploitation of Remote Services Lateral Movement
Adversaries may exploit remote services to gain unauthorized access to internal systems once inside of a network.
T1499.004 Application or System Exploitation Impact
Adversaries may exploit software vulnerabilities that can cause an application or system to crash and deny availability to users.
Why these techniques?

The memory corruption vulnerability in vLLM's public-facing Completions API enables exploitation of public-facing applications (T1190) and remote services (T1210) via malicious prompt embeddings for potential RCE, and facilitates endpoint DoS through application exploitation (T1499.004).

References