CVE-2026-21869
Published: 08 January 2026
Description
llama.cpp is an inference of several LLM models in C/C++. In commits 55d4206c8 and prior, the n_discard parameter is parsed directly from JSON input in the llama.cpp server's completion endpoints without validation to ensure it's non-negative. When a negative value…
more
is supplied and the context fills up, llama_memory_seq_rm/add receives a reversed range and negative offset, causing out-of-bounds memory writes in the token evaluation loop. This deterministic memory corruption can crash the process or enable remote code execution (RCE). There is no fix at the time of publication.
Mitigating Controls (NIST 800-53 r5)AI
Directly requires validation of JSON inputs such as the n_discard parameter to ensure non-negative values, preventing the reversed range and out-of-bounds memory writes.
Implements memory safeguards like address space randomization and non-executable stacks to protect against exploitation of the out-of-bounds writes leading to RCE.
Mandates monitoring the llama.cpp repository for flaw remediation patches and timely installation to address the unpatched memory corruption vulnerability.
Security SummaryAI
CVE-2026-21869 is a memory corruption vulnerability in llama.cpp, a C/C++ inference engine for large language models (LLMs). In commits up to 55d4206c8, the server's completion endpoints parse the n_discard parameter directly from JSON input without validating that it is non-negative. Supplying a negative value, combined with a full context, results in a reversed range and negative offset passed to llama_memory_seq_rm/add functions, triggering out-of-bounds memory writes during the token evaluation loop. This issue is classified under CWE-787 (Out-of-bounds Write) with a CVSS v3.1 base score of 8.8 (AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H).
Remote attackers can exploit this vulnerability by sending crafted JSON requests to the llama.cpp server's completion endpoints, requiring minimal user interaction such as submitting the malicious input. No authentication or privileges are needed, making it accessible over the network with low complexity. Successful exploitation leads to deterministic memory corruption, which can crash the server process or potentially enable remote code execution (RCE) by overwriting critical memory regions.
The GitHub Security Advisory (GHSA-8947-pfff-2f3c) details the issue but notes there is no fix available at the time of publication on January 8, 2026. Security practitioners should monitor the llama.cpp repository for patches, avoid exposing the server publicly, and validate all JSON inputs server-side until remediation is released.
This vulnerability is particularly relevant to AI/ML deployments, as llama.cpp is widely used for efficient LLM inference, potentially exposing model serving infrastructure to compromise. No real-world exploitation has been reported in available data.
Details
- CWE(s)
Affected Products
AI Security AnalysisAI
- AI Category
- NLP and Transformers
- Risk Domain
- N/A
- OWASP Top 10 for LLMs 2025
- None mapped
- MITRE ATLAS Techniques
- None mapped
- Classification Reason
- Matched keywords: llama.cpp, llm, llama.cpp
MITRE ATT&CK Enterprise TechniquesAI
Why these techniques?
Vulnerability in public-facing llama.cpp server endpoints allows unauthenticated remote exploitation via crafted JSON requests, enabling memory corruption and potential RCE, directly mapping to T1190: Exploit Public-Facing Application.