Enforce ruff/flake8-bugbear rules (B) #1703

DimitriPapadopoulos · 2025-10-24T12:22:18Z

Apply Repo-Review suggestion RF101:
https://learn.scientific-python.org/development/guides/repo-review/?repo=skrub-data%2Fskrub&ref=HEAD

rcap107

Hi, thanks for the PR.

Overall, the missing asserts are useful (really wondering why they're missing in the first place). Other changes do not look like they improve the code in a meaningful way (B007, B004, B010); we can stil implement the changes, however.

What I am really concerned with is actually enforcing the bugbear rules. Looking at the docs, a lot of them look fairly esoteric and definitely not something most people (especially new contributors) would be familiar with. Enforcing those rules would make contributing far more complicated than it already is.

As a result, those rules should not be added to the default linting. I'll add them to my own ruleset, but it should not be in the default pyproject.

skrub/_data_ops/_skrub_namespace.py

rcap107 · 2025-11-13T15:00:00Z

skrub/_apply_to_cols.py


 __all__ = ["ApplyToCols", "SingleColumnTransformer", "RejectColumn"]

+_SELECTORS = selectors.all()


selctors.all() isn't selecting all columns, it's selecting the all() selector, which selects all columns.

Suggested change

_SELECTORS = selectors.all()

# By default, select all columns

_SELECT_ALL_COLUMNS = selectors.all()

Same change in all the other files

rcap107 · 2025-11-13T15:00:24Z

skrub/_apply_to_cols.py


 __all__ = ["ApplyToCols", "SingleColumnTransformer", "RejectColumn"]

+_SELECTORS = selectors.all()


Same change in all the other files

rcap107 · 2025-11-13T15:01:01Z

skrub/_data_ops/_skrub_namespace.py

 from ._subsampling import SubsamplePreviews, env_with_subsampling
 from ._utils import KFOLD_5, NULL, attribute_error

+_SELECTORS = selectors.all()


Suggested change

_SELECTORS = selectors.all()

_SELECT_ALL_COLUMNS = selectors.all()

rcap107 · 2025-11-13T15:19:41Z

As a more general comment for this and all the other PRs that you opened, formatting rules should be chosen by the maintainers of the project, rather than added in a dozen of separate PRs that need to be reviewed one by one.

In this PR, a large part of the diff is "whatever", the _SELECTORS change was made while misunderstanding the code, and while reviewing it I ended up misunderstanding the code myself, meaning that if it had been a more critical change it would have introduced a bug in the code.

The suggested rules are way too restrictive for new contributors, so that part of the PR is also not going to be merged, and should definitely have been discussed in an issue rather than added here.

And this is only one of 20 very similar PRs that have been opened, and that I have to spend time on reviewing, making sure that none of the small changes that have been made across a bunch of files can lead to issues of some kind.

In the future, please consider the time and effort that goes into reviewing all of this. I will still try to review the PRs that have already been opened, and track the suggested changes in an issue for discussion and possible implementation.

The proper way of contributing to the repo would be opening a meta-issue that collects a series of sub-issues, so that it is possible to decide whether specific changes are warranted. This would take less time than reviewing PRs, and would not risk introducing bugs in the code.

If it does not look like future PRs take this comment into consideration, I don't exclude closing any new PR like this without even looking at what's inside.

DimitriPapadopoulos · 2025-11-15T12:28:31Z

I would recommend keeping B004, it does result in more correct code, although this should seldom be an issue in practice:

Using hasattr is an unreliable mechanism for testing if an object is callable. If obj implements a custom __getattr__, or if its __call__ is itself not callable, you may get misleading results.
B007 is indeed debatable.
B010 improves code consistency and readability, but I can understand you'd rather ignore it to avoid frustrating new contributors.

DimitriPapadopoulos · 2025-11-15T13:15:51Z

Other changes do not look like they improve the code in a meaningful way (B007, B004, B010); we can stil implement the changes, however.

I have kept the changes in this PR, but these rules will be ignored in the future.

DimitriPapadopoulos · 2025-11-15T13:31:21Z

Still need to fix this new error in a test:

FAILED skrub/tests/test_multi_agg_joiner.py::test_default_cols[pandas-nullable-dtypes] - AssertionError: assert [['rating', '...', 'movieId']] == [['rating', '...ng', 'genre']]
  
  At index 1 diff: ['rating', 'genre', 'movieId'] != ['movieId', 'rating', 'genre']

Columns should be sorted in the same order as in the initial main_table, shouldn't they?

skrub/skrub/tests/test_multi_agg_joiner.py

Lines 9 to 19 in 20598ed

    
           @pytest.fixture 
        
           def main_table(): 
        
               df = pd.DataFrame( 
        
                   { 
        
                       "userId": [1, 1, 1, 2, 2, 2], 
        
                       "movieId": [1, 3, 6, 318, 6, 1704], 
        
                       "rating": [4.0, 4.0, 4.0, 3.0, 2.0, 4.0], 
        
                       "genre": ["drama", "drama", "comedy", "sf", "comedy", "sf"], 
        
                   } 
        
               ) 
        
               return df

DimitriPapadopoulos · 2025-11-25T12:49:54Z

Note that Ruff can automatically apply some rules, and pre-commit will enforce that without any additional work from contributors. Therefore, rules with automatic fixes are not a barrier, they result in consistent and readable code which can only help maintenance and new contributions.

Ruff provides automatic fixes for many B rules, including B004, B007, B010.

B004 Using `hasattr(x, "__call__")` to test if x is callable is unreliable. Use `callable(x)` for consistent results.

B005 Using `.strip()` with multi-character strings is misleading

B006 Do not use mutable data structures for argument defaults

B007 Loop control variable not used within loop body

B008 Do not perform function call `selectors.all` in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

B010 Do not call `setattr` with a constant attribute value. It is not any safer than normal property access.

B015 Pointless comparison.

DimitriPapadopoulos force-pushed the B branch 2 times, most recently from e0bcbf2 to c2e277c Compare November 12, 2025 08:23

rcap107 requested changes Nov 13, 2025

View reviewed changes

skrub/_data_ops/_skrub_namespace.py Show resolved Hide resolved

rcap107 reviewed Nov 13, 2025

View reviewed changes

rcap107 requested changes Nov 13, 2025

View reviewed changes

rcap107 marked this pull request as draft November 13, 2025 15:21

rcap107 added the no changelog needed label Nov 13, 2025

DimitriPapadopoulos force-pushed the B branch 3 times, most recently from ca75b1e to b22ed93 Compare November 15, 2025 13:00

rcap107 mentioned this pull request Nov 20, 2025

META - Discussing formatting changes for the repository #1765

Open

DimitriPapadopoulos force-pushed the B branch from 5bae622 to 4a2a41f Compare November 25, 2025 12:54

DimitriPapadopoulos added 9 commits November 25, 2025 21:57

Appy ruff/flake8-bugbear rule B004

cf3fd14

B004 Using `hasattr(x, "__call__")` to test if x is callable is unreliable. Use `callable(x)` for consistent results.

Apply ruff/flake8-bugbear rule B005

8b37121

B005 Using `.strip()` with multi-character strings is misleading

Apply ruff/flake8-bugbear rule B006

4a03260

B006 Do not use mutable data structures for argument defaults

Apply ruff/flake8-bugbear rule B007

9694ca0

B007 Loop control variable not used within loop body

Apply ruff/flake8-bugbear rule B008

441fcc6

B008 Do not perform function call `selectors.all` in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

Apply ruff/flake8-bugbear rule B010

6ecf7b0

B010 Do not call `setattr` with a constant attribute value. It is not any safer than normal property access.

Apply ruff/flake8-bugbear rule B015

b03ec03

B015 Pointless comparison.

Enforce ruff/flake8-bugbear rules (B)

a7c6a30

Ignore ruff/flake8-bugbear rules B004, B007, B010

d307610

DimitriPapadopoulos force-pushed the B branch from a2bb309 to d307610 Compare November 25, 2025 20:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enforce ruff/flake8-bugbear rules (B) #1703

Enforce ruff/flake8-bugbear rules (B) #1703

Uh oh!

DimitriPapadopoulos commented Oct 24, 2025

Uh oh!

rcap107 left a comment •

edited

Loading

Uh oh!

Uh oh!

rcap107 Nov 13, 2025

Uh oh!

rcap107 Nov 13, 2025

Uh oh!

rcap107 Nov 13, 2025

Uh oh!

rcap107 Nov 13, 2025

Uh oh!

rcap107 commented Nov 13, 2025

Uh oh!

DimitriPapadopoulos commented Nov 15, 2025 •

edited

Loading

Uh oh!

DimitriPapadopoulos commented Nov 15, 2025

Uh oh!

DimitriPapadopoulos commented Nov 15, 2025

Uh oh!

DimitriPapadopoulos commented Nov 25, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		__all__ = ["ApplyToCols", "SingleColumnTransformer", "RejectColumn"]

		_SELECTORS = selectors.all()

	_SELECTORS = selectors.all()
	# By default, select all columns
	_SELECT_ALL_COLUMNS = selectors.all()

Enforce ruff/flake8-bugbear rules (B) #1703

Are you sure you want to change the base?

Enforce ruff/flake8-bugbear rules (B) #1703

Uh oh!

Conversation

DimitriPapadopoulos commented Oct 24, 2025

Uh oh!

rcap107 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rcap107 Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

rcap107 Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

rcap107 Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

rcap107 Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

rcap107 commented Nov 13, 2025

Uh oh!

DimitriPapadopoulos commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DimitriPapadopoulos commented Nov 15, 2025

Uh oh!

DimitriPapadopoulos commented Nov 15, 2025

Uh oh!

DimitriPapadopoulos commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rcap107 left a comment •

edited

Loading

DimitriPapadopoulos commented Nov 15, 2025 •

edited

Loading

DimitriPapadopoulos commented Nov 25, 2025 •

edited

Loading