-
Notifications
You must be signed in to change notification settings - Fork 31
Add CEP for Repodata Wheel Support #145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Travis Hathaway <travis.j.hathaway@gmail.com>
|
|
||
| ### Pixi Integrates with uv (Jan 2024) | ||
|
|
||
| Pixi changes course to use uv directly instead of rip, which unlocks features like editable installations, and git and path dependencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are all now available for conda-only workflows through pixi-build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @pavelzw, thanks so much for the feedback! Do you think we are missing a milestone in our brief history section? This pixi build feature is more about building path/git for conda packages than installing wheels isn't it?
|
|
||
| This CEP introduces a new optional `artifact_url` field in package records to specify download locations for individual packages. | ||
|
|
||
| > Note for this draft: The `artifact_url` field could also be added as a separate CEP to allow it for other record types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would actually be a good idea to avoid asymmetries in package record specifications. Either that or we explicitly mention this new field is for all package record types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, if we have rough consensus that this is a good approach, we should probably split artifact_url in to a separate CEP so that it can apply to all record types.
Co-authored-by: Travis Hathaway <travis.j.hathaway@gmail.com>
|
Thanks @travishathaway, thanks so much for all the updates! I applied them locally and then pushed a commit 👍 |
|
I'm going to leave my opinions on the general goals/ideas/features of this CEP in an effort to help bring other perspectives to this CEP. TL;DR - As a conda-forge/core developer, I personally would NOT recommend folks use this feature, nor would I enable or offer support for this feature for conda-forge. The CEP states
Here is a point-by-point explanation of why in my estimation this feature simply would not work for conda-forge.
In a non-trivial fraction of cases, even for pure-Python packages, the requirements in the conda-forge have subtle differences from the upstream requirements. These changes range from package renaming (e.g., These requirement differences will likely lead to some funky solves that either the conda or conda-forge developers will hear about.
Environment consistency is a tricky concept IMHO. For sure with this CEP one can in some cases create an environment where all constraints are satisfied. However, if the repodata is wrong, due to the issues outlined above, the formal consistency of the requirements doesn't really matter. Reproducibility is an even trickier concept. For an environment to be reproducible, one needs to have the same solver with that solver run under the same conditions. Let's assume the conda version is fixed and the solver command is run on the same machine. Even then the constraint on having the same conditions combined with this CEP in effect means that both the upstream wheel metadata and the conda channel metadata have to be the same. Given that the most likely source of wheels is pypi, there is no way one can promise those same conditions. Even if we restrict ourselves to environments built from lock files, the combination of the conda channel with pypi as a source of wheels will also not always be reproducible. PyPI users can delete packages (as opposed to simply yanking packages) and those deletions will break even locked envs. We do not allow package deletions on conda-forge for this exact reason. Thus for conda-forge, we could not recommend using this feature for reproducible envs from lock files unless PyPI turns off the ability for users to delete files.
The vast majority of the cognative burden is the differing and interacting repodata, not whether or not one types
For the reasons stated above, I am personally skeptical injecting pure-Python wheel metadata into the repodata would consistently result in a correct-enough environments to eliminate the need for new conda builds or the need to repackage pure-python Packages in conda-forge. I am not saying this doesn't work some of the time. Instead I am saying that the solution proposed in this CEP is not so much better than the current "conda, then pip" solve that it can achieve the arguably difficult goals above. Stated another way, in a world where this feature existed instead of the feature to pip-install overtop of conda environments, conda-forge likely would still need/want to repackage everything. Other comments
This statement is a red-flag for me on this CEP. First, conda-forge itself doesn't have an authoritative mapping of its own packages back to python wheels. There are several approaches in the wild and none of them is standardized into an automated bit of repodata tools like conda/mamba/pixi/rattler can read and interact with. See the discussions of PURLS. Second, treating conda-forge as a special channel specifically in a CEP (as opposed to simply using it as a motivating usecase), is definitely IMHO the antithesis of what a CEP is supposed to be. conda is a set of tools and standards and should not be singling out any one purveyor of conda packages.
I don't follow this comment. Can you clarify? For sure I have used this operator in the run section before (see, e.g., https://github.com/conda-forge/ngmix-feedstock/blob/main/recipe/meta.yaml#L30), At minimum any requirement that is |
|
Here is one other point that I think is worth considering. One way I can imagine conda-forge using this feature is through only injecting items into This procedure has some advantages for conda-forge that might actually be worth considering more generally. These are
One issue I have left unaddressed here is testing new repodata entries before they are added. We'd want to build at least one test environment and insure the package, at minimum, imports before we pushed it out to the world. On thing I am noticing is that as we add on these additional requirements and desires, it seems almost simpler for conda-forge to use its existing feedstock infrastructure. We'd likely have to build a new "staged-wheels" system to manage this kind of process. |
|
Hi @beckermr, thanks so much for your time reviewing and responding to this draft, I really appreciate it! I would like to use your feedback to strengthen the draft. Version exclusions
Thanks so much for pointing this out. My updated understanding is that I'll update the CEP to make sure that is clear. Mapping names centrally to conda-forge
Great points. We were probably trying too hard to make the community approach the standard, but as you point out, it isn't currently standardized. I think having the wheel index own the mapping is still the right approach, what if we have the index declare which channel it is mapping names to. For example, we could add an optional field called What do you think about this idea? PyPI can delete packages
Another great point that we could address in the CEP! There has been discussion from the PyPI community over the last year about standardizing around the deletion policy. For example there was a withdrawn PEP 763 and the Discuss Python.org topic about it. The consensus came down to:
This would be a tradeoff of directly using PyPI packages. Users would get access to thousands of packages with no extra hosting requirements, but they would also be subject to how PyPI currently works. Someone using a PyPI wheel repodata would have to decide if that is a good tradeoff for them. However, the lock file formats (rattler-lock-v6 and conda-lock-v1) already support a hybrid ecosystem with PyPI sections in the lockfiles. If someone wants to use a wheels channel directly from PyPI, it isn't better or worse than we have right now for reproducibility. In fact, channels that mirror/store wheels (as you suggest below) would actually improve reproducibility compared to the current "conda then pip" workflow.
As you point out, there are workflows where we could make the system more reproducible than we have now. Do you think that the CEP should recommend that production channels mirror wheels to ensure reproducibility, rather than relying directly on PyPI URLs? Downstream patching ability for conda-forge (and other ecosystems)
I would love to help find the right solution to this! Thanks again for the really valuable perspective. I think there are two complementary approaches we should take:
You're absolutely right that there is a burden caused by is conflicting metadata. Repodata patching is how we could address this, channels can correct metadata conflicts so users don't encounter them. However, I also think the current client workflows are also a burden. Giving users seamless access to thousands of packages without requiring them to know whether they're from PyPI or conda channels would solve a huge pain point.
You're right to be skeptical about fully eliminating the need for feedstocks. This CEP won't replace conda-forge's packaging infrastructure - metadata differences mean many packages will still need proper conda recipes. This is about handling the simpler cases more efficiently. Think of it as an additional tool for easier pure-Python packages, not a replacement for feedstocks. I'll update the CEP to better capture this view. Implementation plan for a wheel channel
I am really liking your thoughts on how we could implement this, nice! I am not sure if the implementation plan should be part of this CEP or not, so I would be grateful for everyone's thoughts on that. However, I really like where you are going with this plan and I also envision some sort of phased approach. We could start with fully manual curation, but then move toward semi-automated as we learn from the manual process. This balances lower barrier (no recipes needed for many packages) with quality control. Complex packages should still use feedstocks, but this handles the simpler pure-Python case more efficiently. It would be amazing to use a conda-forge wheel channel as a test case for this if the community is interested. Thanks again for all of the extremely valuable input, I'm looking forward to hearing more of your thoughts as we continue to refine this draft. |
|
Not commenting on the whole CEP (I share much of @beckermr's reservations, but currently don't have a strong opinion), just one aspect that's important to get right IMO:
I think it would be a good idea to consider whether this can build on top of the proposed PEP 804 (discourse), which tries to standardize something useable around the whole name mapping issue. It'd be certainly better if we can build on top of that (and help support that PEP) rather than inventing yet another scheme. |
Updates include: - Clarifications on naming standards - Channel mapping - Patching capabilities for dependency management # Conflicts: # cep-XXXX.md
|
Hi @beckermr, I made the updates to the relevant sections to fix version exclusions, remove reliance on conda-forge for channel mapping, clarify that we need downstream patching ability, added an implementation options section, and added recommendations for protecting against PyPI packages being deleted. Thanks again for all of the great feedback. Hi @h-vetinari
Thanks that's a great point that we should build on this! I added a call out to PEP-804 in the Naming standard and channel mapping section. |
Updated version to replace #144, developed with @travishathaway.
This CEP outlines how native support for pure Python wheel packages could be achieved by adding support for them in repodata. When implemented, conda clients will be able to seamlessly install conda packages and pure Python wheels from enabled channels.
Checklist for submitter
cep-0000.mdnamedcep-XXXX.mdin the root level.Checklist for CEP approvals
${greatest-number-in-main} + 1.cep-XXXX.mdfile has been renamed accordingly.# CEP XXXX -header has been edited accordingly.pre-commitchecks are passing.