RFC: Lossless fixed-width integer bit-packing codec for ADC-style data

Hi! This is an exploratory design discussion based on a working prototype and benchmarks.
I’d really appreciate feedback on API shape and integration direction.

## Motivation
Many scientific datasets store integer signals where only a subset of bits are meaningful
(e.g. 10–12 bit ADC data stored in uint16). Zarr/numcodecs currently rely on byte-level
compression for such data, which does not explicitly remove unused bits.

## Observations
In an ADC-style benchmark (uint16, effective 12 bits, 10M samples), default Zarr compression
reduced storage from ~19 MB to ~7 MB, but further gains plateaued. Existing bit-level tools
in numcodecs (e.g. PackBits) do not support lossless integer bit-width packing.

## Proposal
Introduce a lossless integer bit-packing codec/filter that:
- Packs fixed-width integer values using exactly N bits
- Operates per chunk
- Is fully reversible
- Can be composed with existing compression

## Prototype
I implemented a pure-Python prototype to validate feasibility:
- Correct round-trip verified
- Storage reduction proportional to effective bit-width
- Zarr v2 compatible (self-describing stream)

## Results (summary)
- Bit-packing alone reduces storage predictably (e.g. 12/16 → ~0.75×)
- Bit-packing + Blosc/Zstd achieves size near default compression
- Python implementation is CPU-heavy → optimized backend likely needed

## Questions
- Should this live as a codec or filter in numcodecs?
- How should bit-width metadata be handled (header vs external)?
- Is limiting initial scope to uint16 reasonable?
- Any guidance on aligning this with Zarr v3’s codec pipeline?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: Lossless fixed-width integer bit-packing codec for ADC-style data #813

Motivation

Observations

Proposal

Prototype

Results (summary)

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RFC: Lossless fixed-width integer bit-packing codec for ADC-style data #813

Description

Motivation

Observations

Proposal

Prototype

Results (summary)

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions