Wm/1270/cropping #1447

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Draft

wael-mika wants to merge 3 commits into ecmwf:shmh40/dev/1270-idx-global-local from wael-mika:wm/1270/cropping

Contributor

wael-mika commented Dec 11, 2025

Description

Implemented Healpix cropping as discussed with three different methods for spatial cover. The code also supports different overlaps between crops provided for the teacher and the student and corrects the bug when passing num_sample > 1. In the config, three different examples were provided on how to set up cropping; it was tested on the Juwels booster and it ran for few epochs.

Checklist before asking for review

I have performed a self-review of my code
My changes comply with basic sanity checks:
- I have fixed formatting issues with ./scripts/actions.sh lint
- I have run unit tests with ./scripts/actions.sh unit-test
- I have documented my code and I have updated the docstrings.
- I have added unit tests, if relevant
I have tried my changes with data and code:
- I have run the integration tests with ./scripts/actions.sh integration-test
- (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
- (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
I have informed and aligned with people impacted by my change:
- for config changes: the MatterMost channels and/or a design doc
- for changes of dependencies: the MatterMost software development channel

wael-mika added 2 commits

December 10, 2025 00:00


          Healpix cropping simple implementation

68f80ca


          Healpix cropping simple implementation with control over the num_samp…

0684f86

…les and overlap + fixing the num_sample bug

github-project-automation bot added this to WeatherGen-dev


          Fixed lint

c95f100

clessig reviewed

View reviewed changes

Collaborator

clessig left a comment

Thanks. I left some comments. Did you generate plots of the different masks, as discussed. The final correctness is really hard to judge from reading the code or even running it.

src/weathergen/datasets/masking.py

    
                              mask = np.zeros(num_cells, dtype=bool)

                              mask[child_indices] = True

                      elif strategy == "cropping_healpix":

Collaborator

clessig Dec 11, 2025

Can we put this into a separate function

src/weathergen/datasets/masking.py

    
                      # iterate over all target samples

                      target_masks: list[np.typing.NDArray] = []

                      target_metadata: list[SampleMetaData] = []

                      target_config_mapping = []  # Track which config each target mask came from

Collaborator

clessig Dec 11, 2025

Why do we need the target_config_mapping. The target_metadata should contain all relevant information, not?

src/weathergen/datasets/masking.py

    
                          overlap_set = set(overlap_with)

                          # Use intelligent center selection based on overlap target

                          if overlap_ratio > 0.7:

Collaborator

clessig Dec 11, 2025

Where does the 0.7 and the 0.3 below come from?

src/weathergen/datasets/masking.py

    
                          crop2 = _select_spatially_contiguous_cells(0, 9, method="geodesic_disk",

                                                                     overlap_with=crop1, overlap_ratio=0.3)

                      """

                      import warnings

Collaborator

clessig Dec 11, 2025

Please put all imports at the beginning of the file

src/weathergen/datasets/masking.py

    
                      """

                      self.rng = rng

                  def _select_spatially_contiguous_cells(

Collaborator

clessig Dec 11, 2025

We should break this function up into smaller functions that cover the different branches. The overall flow is difficult to read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet