Skip to content

Conversation

@rcap107
Copy link
Member

@rcap107 rcap107 commented Nov 28, 2025

This PR adds a new example for handwritten digit classification using skrub's Data Ops and updates several broken links in the documentation to point to the correct examples.

@rcap107 rcap107 modified the milestones: Release 0.6.3, 0.7.0 Nov 28, 2025
@rcap107
Copy link
Member Author

rcap107 commented Dec 1, 2025

From skrub's meeting: hpo should take less time so I need to look into the grid to speed things up a bit

learner to execute all steps as intended.

See :ref:`sphx_glr_auto_examples_data_ops_0110_data_ops_intro.py` for an introductory
See :ref:`sphx_glr_auto_examples_data_ops_1110_data_ops_intro.py` for an introductory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should start always using some explicitly named anchors like this otherwise those links are likely to get broken again

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was doing that, but then at some point for some reason it started breaking the rendering because the autoexamples were adding their own label, so there were two and the gallery was not rendering properly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the usual sphinx fun :)

Handwritten digit classification with skrub
============================================
This example demonstrates how to use skrub's Data Ops to build a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this sentence could say why this example is here / what it adds wrt to the other examples? e.g. illustrate dataops can be useful on non-dataframish data or something like that

from sklearn.svm import SVC

model = SVC()
predictions = X.skb.apply(model, y=y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here also we could insist on what distinguishes this example / what it adds wrt to the other hyperparam search examples. is it that we are using numpy arrays rather than dataframes?

maybe we could go one step further and use pytorch instead of numpy+sklearn, given that we have it installed for the example gallery anyway due to the textencoder. it's more often used for processing images and "can I use this other framework" is a question that comes up

# best model found during hyperparameter tuning:

best_learner = search.best_learner_
cv_results_search = skrub.cross_validate(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically we are overfitting the hyperparameters. maybe we should split away a test set at the beginning that we don't use for hyperparam search?

@jeromedockes
Copy link
Member

From skrub's meeting: hpo should take less time so I need to look into the grid to speed things up a bit

is HPO necessary in this example (given that it is shown in a couple of other examples)? What is the main point you want to get across with this one? Deciding that might help find what can be cut to speed it up

@jeromedockes
Copy link
Member

Also, if you want to work more on the added example, there is the option to move the fix of broken links to a separate PR so that this part can be merged sooner

@rcap107
Copy link
Member Author

rcap107 commented Dec 7, 2025

Also, if you want to work more on the added example, there is the option to move the fix of broken links to a separate PR so that this part can be merged sooner

I think I'll do that so I can also deal with #1793

@rcap107
Copy link
Member Author

rcap107 commented Dec 8, 2025

Links will be fixed in #1798, the rest of the PR needs some work and will likely not be merged for a while

@rcap107 rcap107 changed the title DOC - adding a new example and fixing a few broken links DOC - adding a new example on image classification with data ops Dec 8, 2025
@rcap107 rcap107 removed this from the 0.7.0 milestone Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants