CVE-2026-31971
Published: 18 March 2026
Description
HTSlib is a library for reading and writing bioinformatics file formats. CRAM is a compressed format which stores DNA sequence alignment data using a variety of encodings and compression methods. When reading data encoded using the `BYTE_ARRAY_LEN` method, the `cram_byte_array_len_decode()`…
more
failed to validate that the amount of data being unpacked matched the size of the output buffer where it was to be stored. Depending on the data series being read, this could result either in a heap or a stack overflow with attacker-controlled bytes. Depending on the data stream this could result either in a heap buffer overflow or a stack overflow. If a user opens a file crafted to exploit this issue it could lead to the program crashing, overwriting of data structures on the heap or stack in ways not expected by the program, or changing the control flow of the program. It may be possible to use this to obtain arbitrary code execution. Versions 1.23.1, 1.22.2 and 1.21.1 include fixes for this issue. There is no workaround for this issue.
Mitigating Controls (NIST 800-53 r5)AI
SI-2 requires identification, reporting, and correction of flaws such as the buffer overflow in HTSlib's cram_byte_array_len_decode function through timely patching to versions like 1.23.1.
SI-10 mandates validation of information inputs at defined points, directly preventing the failure to check unpacked BYTE_ARRAY_LEN data size against the output buffer in CRAM files.
SI-16 enforces memory protections like address space layout randomization and stack guards that mitigate exploitation of heap and stack buffer overflows with attacker-controlled data.
Security SummaryAI
HTSlib, a C library for reading and writing high-throughput sequencing data formats including CRAM, is affected by CVE-2026-31971, published on 2026-03-18. The vulnerability occurs in the cram_byte_array_len_decode() function when processing data encoded with the BYTE_ARRAY_LEN method in CRAM files, which store compressed DNA sequence alignment data. This function fails to validate that the unpacked data size matches the output buffer, resulting in either a heap buffer overflow or stack overflow using attacker-controlled bytes. The issue is rated 8.1 on the CVSS 3.1 scale (AV:N/AC:L/PR:N/UI:R/S:U/C:N/I:H/A:H) and maps to CWEs 121 (stack-based buffer overflow), 122 (heap-based buffer overflow), 787 (out-of-bounds write), and 1284 (buffer underwrite).
Exploitation requires an attacker to craft a malicious CRAM file and convince a user to open it in an application linked to a vulnerable HTSlib version, such as bioinformatics tools processing sequencing data. No privileges are needed (PR:N), and attacks can originate remotely (AV:N) with low complexity (AC:L), though user interaction is required (UI:R). Outcomes include program crashes, unexpected overwriting of heap or stack data structures, control flow hijacking, or potential arbitrary code execution, with high impacts on integrity and availability but no confidentiality loss.
Mitigation is available via patches in HTSlib versions 1.23.1, 1.22.2, and 1.21.1; no workaround exists. The GitHub security advisory (GHSA-jvx4-4wq7-6fmh) and fixing commit (01cd003b46fa2ebea4d9be5475b11217eb4c11be) provide full details on the changes. Security practitioners should prioritize updating affected bioinformatics pipelines and scanning for vulnerable HTSlib instances.
Details
- CWE(s)
Affected Products
MITRE ATT&CK Enterprise TechniquesAI
Why these techniques?
The vulnerability is a heap/stack buffer overflow in a client library (HTSlib) triggered by processing a malicious CRAM file in bioinformatics applications, directly enabling exploitation for client-side arbitrary code execution.