Remove reference cycle within SpatialData to fix memory leak in bounding_box_query #914

nickodell · 2025-03-27T03:58:26Z

Introduction

Hi, my name is Nick, and I'm part of the SciPy issue triage team, and I was looking into high memory usage within spatialdata as part of my investigation into scipy/scipy#22702.

I believe that this high memory usage is caused by a reference cycle within spatialdata, between the SpatialData class and the QueryManager class.

Here is a screenshot from refcycle showing the reference cycle. The cycle is that for any SpatialData object, you can access SpatialData._query._sdata, and get the original SpatialData object back. Although it is not a true memory leak, this reference cycle can prevent a SpatialData object from being cleaned up until a full garbage collection pass is run.

Practical example

Here is an example of code that creates a reference cycle, based on issue #850 which repeatedly performs a bounding box query on a SpatialData object.

import spatialdata as sd
import numpy as np
from datetime import datetime
import psutil
import warnings


def load_data():
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        return sd.read_zarr("visium_brain.zarr")

def process_data():
    for _ in range(1000):
        tic = datetime.now()
        bb = np.array(list(sd.get_extent(sdata, elements=["ST8059048_hires_image"], coordinate_system="ST8059048").values()))
        sdata_bb = sd.bounding_box_query(sdata, ("x", "y"), bb[:, 0], bb[:, 1], "ST8059048")
        toc = datetime.now()
        print(f"time elapsed: {(toc - tic).total_seconds()} s; memory used: {psutil.Process().memory_info().rss / 1e9} GB")

sdata = load_data()
process_data()

The dataset is taken from the spatial query tutorial.

On current main, this will use (on my computer) up to 11 GB of memory, with drops now and again because of the garbage collector. With the patch, it stays at 1 GB of memory consistently.

Alternative fixes

There are multiple ways to break a reference cycle like this. I chose to break the link from SpatialData to QueryManager by lazily constructing the QueryManager. This seemed like the most straightforward way, since QueryManager looks like it is fairly cheap to create. You could also re-structure the code or use weak references.

Regression test

Normally, when fixing a bug, I'd add a regression test.

I'm not sure how to add a regression test for this. If I were writing a SciPy test, I would use one of the test utilities we have, scipy._lib._gcutils.assert_deallocated. This is a context manager which does the following:

Turns off the normal GC.
Creates an object and keeps a weak reference to it.
Allows user code to run, and manipulate the object, potentially causing a reference cycle.
Deletes its strong reference to the object.
Asserts that the weak reference no longer works, and that reference counting cleaned the object up.
Turns the GC back on.

Would you like me to add a regression test for this? If you want that, I could do that by copying SciPy's implementation. It is 105 lines and has no dependencies. It is licensed under BSD 3 Clause.

cc @grst @LucaMarconato

codecov · 2025-03-27T04:05:05Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.19%. Comparing base (22b06bc) to head (1c87fc6).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #914      +/-   ##
==========================================
- Coverage   92.19%   92.19%   -0.01%     
==========================================
  Files          49       49              
  Lines        7562     7561       -1     
==========================================
- Hits         6972     6971       -1     
  Misses        590      590

Files with missing lines	Coverage Δ
src/spatialdata/_core/spatialdata.py	`91.93% <100.00%> (-0.01%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

LucaMarconato · 2026-01-04T17:46:18Z

Thanks @nickodell for your analysis and for the fix. Since self._query plays a minor role in the library and now it has been removed, I would not add a regression test.

Still, it would be useful to know more about the context manager you mentioned in case in the future we want to test against a (different) memory leak. Can you please add a pointer to the code? Thank you!

nickodell · 2026-01-04T18:37:20Z

Still, it would be useful to know more about the context manager you mentioned in case in the future we want to test against a (different) memory leak. Can you please add a pointer to the code? Thank you!

Sure.

Context manager is defined here:

https://github.com/scipy/scipy/blob/v1.16.3/scipy/_lib/_gcutils.py#L61

Here is an example of how to test code with it. This test is testing that a MatrixLinearOperator can be automatically cleaned up, even after someone calls .adjoint() on it.

https://github.com/scipy/scipy/blob/f6a1d49c42548018d819a37021e295b441e048b2/scipy/sparse/linalg/tests/test_interface.py#L833

Remove reference cycle within SpatialData

8b7d09b

Merge branch 'main' into njo-low-memory

1c87fc6

LucaMarconato merged commit 0731edd into scverse:main Jan 4, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove reference cycle within SpatialData to fix memory leak in bounding_box_query #914

Remove reference cycle within SpatialData to fix memory leak in bounding_box_query #914

Uh oh!

nickodell commented Mar 27, 2025 •

edited

Loading

Uh oh!

codecov bot commented Mar 27, 2025 •

edited

Loading

Uh oh!

LucaMarconato commented Jan 4, 2026

Uh oh!

Uh oh!

nickodell commented Jan 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Remove reference cycle within SpatialData to fix memory leak in bounding_box_query #914

Remove reference cycle within SpatialData to fix memory leak in bounding_box_query #914

Uh oh!

Conversation

nickodell commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Introduction

Practical example

Alternative fixes

Regression test

Uh oh!

codecov bot commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

LucaMarconato commented Jan 4, 2026

Uh oh!

Uh oh!

nickodell commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nickodell commented Mar 27, 2025 •

edited

Loading

codecov bot commented Mar 27, 2025 •

edited

Loading

nickodell commented Jan 4, 2026 •

edited

Loading