CVE-2025-62164

High

Published: 21 November 2025

Published

21 November 2025

Modified

04 December 2025

KEV Added

—

Patch

—

CVSS Score 8.8 CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H

EPSS Score 0.0019 40.6th percentile

Risk Priority 18 60% EPSS · 20% KEV · 20% CVSS

Description

vLLM is an inference and serving engine for large language models (LLMs). From versions 0.10.2 to before 0.11.1, a memory corruption vulnerability could lead to a crash (denial-of-service) and potentially remote code execution (RCE), exists in the Completions API endpoint.…

When processing user-supplied prompt embeddings, the endpoint loads serialized tensors using torch.load() without sufficient validation. Due to a change introduced in PyTorch 2.8.0, sparse tensor integrity checks are disabled by default. As a result, maliciously crafted tensors can bypass internal bounds checks and trigger an out-of-bounds memory write during the call to to_dense(). This memory corruption can crash vLLM and potentially lead to code execution on the server hosting vLLM. This issue has been patched in version 0.11.1.

Mitigating Controls (NIST 800-53 r5)AI

SI-10 Information Input Validation good match

prevent

SI-10 requires validation of user-supplied inputs like prompt embeddings to prevent malformed serialized tensors from bypassing bounds checks during deserialization.

SI-16 Memory Protection good match

prevent

SI-16 enforces memory protection mechanisms that directly mitigate out-of-bounds memory writes triggered by malicious sparse tensors in the to_dense() call.

SI-2 Flaw Remediation good match

prevent

SI-2 ensures timely patching of the vLLM flaw, as demonstrated by the fix in version 0.11.1 that adds validation for malformed sparse tensors.

Security SummaryAI

CVE-2025-62164 is a memory corruption vulnerability affecting vLLM, an inference and serving engine for large language models, in versions 0.10.2 through 0.11.0. The issue resides in the Completions API endpoint, which processes user-supplied prompt embeddings by loading serialized tensors via torch.load() without adequate validation. A change in PyTorch 2.8.0 disables sparse tensor integrity checks by default, allowing maliciously crafted tensors to bypass internal bounds checks and trigger an out-of-bounds memory write during the to_dense() call.

Attackers with low privileges (PR:L) can exploit this vulnerability over the network (AV:N) with low complexity (AC:L) and no user interaction (UI:N), as indicated by its CVSS v3.1 base score of 8.8. By submitting specially crafted prompt embeddings to the Completions API endpoint, an attacker can cause a denial-of-service crash or potentially achieve remote code execution on the hosting server, with high impacts on confidentiality, integrity, and availability (C:H/I:H/A:H). The vulnerability maps to CWEs including CWE-20 (Improper Input Validation), CWE-123 (Write-what-where Condition), CWE-502 (Deserialization of Untrusted Data), and CWE-787 (Out-of-bounds Write).

The vLLM project has patched this issue in version 0.11.1. Mitigation details are available in the project's security advisory (GHSA-mrw7-hf4f-83pf), the fixing pull request (#27204), and the commit (58fab50d82838d5014f4a14d991fdb9352c9c84b) that adds validation to prevent the exploitation of malformed sparse tensors.

This vulnerability is particularly relevant to AI/ML deployments, as vLLM is designed for serving LLMs, potentially exposing production inference servers to risks from untrusted inputs. No public reports of real-world exploitation are noted in the available information.

Details

CWE(s): CWE-20 CWE-123 CWE-502 CWE-787

Affected Products

vllm

0.11.1 · 0.10.2 — 0.11.1

AI Security AnalysisAI

AI Category: APIs and Models
Risk Domain: LLM/Generative AI Risks
OWASP Top 10 for LLMs 2025: None mapped
MITRE ATLAS Techniques: None mapped
Classification Reason: vLLM is an inference and serving engine for LLMs, with the vulnerability specifically in the Completions API endpoint that processes user-supplied prompt embeddings using torch.load() without validation, fitting APIs for model inference and serving.

MITRE ATT&CK Enterprise TechniquesAI

T1190 Exploit Public-Facing Application Initial Access

Adversaries may attempt to exploit a weakness in an Internet-facing host or system to initially access a network.

attack.mitre.org →

T1210 Exploitation of Remote Services Lateral Movement

Adversaries may exploit remote services to gain unauthorized access to internal systems once inside of a network.

attack.mitre.org →

T1499.004 Application or System Exploitation Impact

Adversaries may exploit software vulnerabilities that can cause an application or system to crash and deny availability to users.

attack.mitre.org →

Why these techniques?

The memory corruption vulnerability in vLLM's public-facing Completions API enables exploitation of public-facing applications (T1190) and remote services (T1210) via malicious prompt embeddings for potential RCE, and facilitates endpoint DoS through application exploitation (T1499.004).

References

https://github.com/vllm-project/vllm/commit/58fab50d82838d5014f4a14d991fdb9352c9c84b
Patch · security-advisories@github.com
https://github.com/vllm-project/vllm/pull/27204
Issue Tracking, Patch, Vendor Advisory · security-advisories@github.com
https://github.com/vllm-project/vllm/security/advisories/GHSA-mrw7-hf4f-83pf
Issue Tracking, Vendor Advisory · security-advisories@github.com