Skip to content

Data access protocol #10

@rossant

Description

@rossant

Following today's discussion, here are some quick AI-generated notes for future reference on how a simple ndarray access spec might look. Buffers could be described with a protocol that tells the GSP renderer how to fetch them when they are not provided directly as base64-encoded strings. This would be useful for accessing large remote datasets without requiring the client to download them. Users could define plugins to support arbitrary schemes, code that takes URIs as input and returns binary buffers as output.

This issue is intended solely to spark discussion and gather ideas. Everything here is preliminary and subject to change.


NDArray JSON (byte-stride model)

Addressing rule (one line)

For any element index i = (i0,…,ik-1), its byte address is:

address = uri + offset + Σ ( i_d * strides[d] )     // strides are in BYTES; may be negative

Objects

1) Array

{
  "type": "ndarray",
  "storage": {
    "uri": "arbitrary://your/location",   // e.g. file:///path, s3://bucket/key, ipc://name, etc.
    "byte_order": "little"                // "little" | "big" (applies to multi-byte numeric kinds)
  },
  "dtype": {
    "kind": "float",                      // e.g. "uint", "int", "float", "complex", "bool"
    "bits": 32,                           // size per scalar lane, in bits (e.g. 8,16,32,64,128)
    "lanes": 1                            // vector lanes per element (1 for scalars)
  },
  "shape":   [N0, N1, ...],               // integers ≥ 0
  "strides": [S0, S1, ...],               // signed integers (bytes per step along each dim)
  "offset":  0                            // signed integer: byte offset from start of 'uri' buffer
}

Invariants

  • len(shape) == len(strides) (ND).
  • If you want C-contiguous, set strides[d] = (∏_{j>d} shape[j]) * itemsize.
  • If you want Fortran-contiguous, set strides[d] = (∏_{j<d} shape[j]) * itemsize.
  • itemsize = (dtype.bits/8) * dtype.lanes.

2) View (arbitrary strided view on any buffer)

A view is just “the same addressing rule” with its own shape/strides/offset, pointing at any URI (usually the same as a base array, but it doesn’t have to be).

{
  "type": "ndview",
  "storage": {
    "uri": "arbitrary://your/location",
    "byte_order": "little"
  },
  "dtype": {
    "kind": "float",
    "bits": 32,
    "lanes": 1
  },
  "shape":   [M0, M1, ...],
  "strides": [T0, T1, ...],                // arbitrary signed byte strides
  "offset":  O                              // byte offset to the view's element [0,...,0]
}

You can optionally add "base": {"uri": "...", "note": "for human context only"}; it has no effect on addressing.


Minimal examples

A) 2D C-contiguous array (float32, 100×200)

{
  "type": "ndarray",
  "storage": { "uri": "file:///data/A.bin", "byte_order": "little" },
  "dtype":   { "kind": "float", "bits": 32, "lanes": 1 },
  "shape":   [100, 200],
  "strides": [800, 4],          // [200*4, 4]
  "offset":  0
}

B) Fortran-contiguous array (float32, 100×200)

{
  "type": "ndarray",
  "storage": { "uri": "file:///data/B.bin", "byte_order": "little" },
  "dtype":   { "kind": "float", "bits": 32, "lanes": 1 },
  "shape":   [100, 200],
  "strides": [4, 400],          // column-major: [1*4, 100*4]
  "offset":  0
}

C) Arbitrary strided view on A: reverse first axis, take every 2nd column

  • Base: A from (A) above (C-contig, strides [800, 4]).
  • View: start at element [80, 10], shape [40, 30], steps [-1, +2].
  • View strides: [-800, 8].
  • View offset: 0 + 80*800 + 10*4 = 64040.
{
  "type": "ndview",
  "storage": { "uri": "file:///data/A.bin", "byte_order": "little" },
  "dtype":   { "kind": "float", "bits": 32, "lanes": 1 },
  "shape":   [40, 30],
  "strides": [-800, 8],
  "offset":  64040
}

D) 3D array (float32, 8×64×64) with a nontrivial origin

{
  "type": "ndarray",
  "storage": { "uri": "s3://bucket/vol.bin", "byte_order": "little" },
  "dtype":   { "kind": "float", "bits": 32, "lanes": 1 },
  "shape":   [8, 64, 64],
  "strides": [16384, 256, 4],   // [64*64*4, 64*4, 4]
  "offset":  4096               // array starts 4 KiB into the object
}

E) Arbitrary view slicing D at [2:6, 10:42, 5:37] (all contiguous)

  • Shape: [4, 32, 32]
  • Strides (same as base): [16384, 256, 4]
  • Offset: 4096 + (2*16384 + 10*256 + 5*4) = 4096 + 35344 + 20 = 39460 ← (check: 216384=32768; 10256=2560; 5*4=20; sum=35348; 4096+35348=39444)
    Correct offset: 39444.
{
  "type": "ndview",
  "storage": { "uri": "s3://bucket/vol.bin", "byte_order": "little" },
  "dtype":   { "kind": "float", "bits": 32, "lanes": 1 },
  "shape":   [4, 32, 32],
  "strides": [16384, 256, 4],
  "offset":  39444
}

Notes & rationale (kept tight)

  • No parsing needed: shapes/strides are integer lists; no slice strings.
  • Arbitrary layouts: any order, negative strides (reversals), gaps, interleaving—all expressible.
  • Views are first-class: a view is just another ND descriptor with its own shape/strides/offset on any uri.
  • Endianness & dtype are explicit and simple; no opaque dtype strings.
  • Portability: the one formula (above) is enough to map indices → bytes in any language.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions