-
Notifications
You must be signed in to change notification settings - Fork 187
DOC - adding a new example on image classification with data ops #1778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
DOC - adding a new example on image classification with data ops #1778
Conversation
|
From skrub's meeting: hpo should take less time so I need to look into the grid to speed things up a bit |
| learner to execute all steps as intended. | ||
|
|
||
| See :ref:`sphx_glr_auto_examples_data_ops_0110_data_ops_intro.py` for an introductory | ||
| See :ref:`sphx_glr_auto_examples_data_ops_1110_data_ops_intro.py` for an introductory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should start always using some explicitly named anchors like this otherwise those links are likely to get broken again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was doing that, but then at some point for some reason it started breaking the rendering because the autoexamples were adding their own label, so there were two and the gallery was not rendering properly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the usual sphinx fun :)
| Handwritten digit classification with skrub | ||
| ============================================ | ||
| This example demonstrates how to use skrub's Data Ops to build a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this sentence could say why this example is here / what it adds wrt to the other examples? e.g. illustrate dataops can be useful on non-dataframish data or something like that
| from sklearn.svm import SVC | ||
|
|
||
| model = SVC() | ||
| predictions = X.skb.apply(model, y=y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here also we could insist on what distinguishes this example / what it adds wrt to the other hyperparam search examples. is it that we are using numpy arrays rather than dataframes?
maybe we could go one step further and use pytorch instead of numpy+sklearn, given that we have it installed for the example gallery anyway due to the textencoder. it's more often used for processing images and "can I use this other framework" is a question that comes up
| # best model found during hyperparameter tuning: | ||
|
|
||
| best_learner = search.best_learner_ | ||
| cv_results_search = skrub.cross_validate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically we are overfitting the hyperparameters. maybe we should split away a test set at the beginning that we don't use for hyperparam search?
is HPO necessary in this example (given that it is shown in a couple of other examples)? What is the main point you want to get across with this one? Deciding that might help find what can be cut to speed it up |
|
Also, if you want to work more on the added example, there is the option to move the fix of broken links to a separate PR so that this part can be merged sooner |
I think I'll do that so I can also deal with #1793 |
|
Links will be fixed in #1798, the rest of the PR needs some work and will likely not be merged for a while |
This PR adds a new example for handwritten digit classification using skrub's Data Ops and updates several broken links in the documentation to point to the correct examples.