Add CEP for Repodata Wheel Support #145

danyeaw · 2025-12-23T20:13:16Z

Updated version to replace #144, developed with @travishathaway.

This CEP outlines how native support for pure Python wheel packages could be achieved by adding support for them in repodata. When implemented, conda clients will be able to seamlessly install conda packages and pure Python wheels from enabled channels.

Checklist for submitter

I am submitting a new CEP: Repodata Wheel Support.
- I am using the CEP template by creating a copy cep-0000.md named cep-XXXX.md in the root level.
I am submitting modifications to CEP XX.
Something else: (add your description here).

Checklist for CEP approvals

The vote period has ended and the vote has passed the necessary quorum and approval thresholds.
A new CEP number has been minted. Usually, this is ${greatest-number-in-main} + 1.
The cep-XXXX.md file has been renamed accordingly.
The # CEP XXXX - header has been edited accordingly.
The CEP status in the table has been changed to approved.
The last modification date in the table has been updated accordingly.
The table in the README has been updated with the new CEP entry.
The pre-commit checks are passing.

Co-authored-by: Travis Hathaway <travis.j.hathaway@gmail.com>

cep-XXXX.md

pavelzw · 2025-12-24T01:34:34Z

cep-XXXX.md

+
+### Pixi Integrates with uv (Jan 2024)
+
+Pixi changes course to use uv directly instead of rip, which unlocks features like editable installations, and git and path dependencies.


these are all now available for conda-only workflows through pixi-build.

Hey @pavelzw, thanks so much for the feedback! Do you think we are missing a milestone in our brief history section? This pixi build feature is more about building path/git for conda packages than installing wheels isn't it?

cep-XXXX.md

travishathaway · 2025-12-24T09:00:14Z

cep-XXXX.md

+
+This CEP introduces a new optional `artifact_url` field in package records to specify download locations for individual packages.
+
+> Note for this draft: The `artifact_url` field could also be added as a separate CEP to allow it for other record types.


I think that would actually be a good idea to avoid asymmetries in package record specifications. Either that or we explicitly mention this new field is for all package record types.

I agree, if we have rough consensus that this is a good approach, we should probably split artifact_url in to a separate CEP so that it can apply to all record types.

cep-XXXX.md

Co-authored-by: Travis Hathaway <travis.j.hathaway@gmail.com>

danyeaw · 2025-12-24T13:30:19Z

Thanks @travishathaway, thanks so much for all the updates! I applied them locally and then pushed a commit 👍

beckermr · 2025-12-25T18:10:00Z

I'm going to leave my opinions on the general goals/ideas/features of this CEP in an effort to help bring other perspectives to this CEP.

TL;DR - As a conda-forge/core developer, I personally would NOT recommend folks use this feature, nor would I enable or offer support for this feature for conda-forge.

The CEP states

By adding native support for pure Python wheels to repodata, conda clients can:

Resolve dependencies across conda and PyPI packages in a single solve

Provide users with transparent access to the broader Python ecosystem

Maintain environment consistency and reproducibility

Eliminate the cognitive burden of managing two package managers

Fill gaps in conda package availability without requiring new conda builds

Reduce the maintenance burden by fully or partially eliminating the need to create and maintain conda recipes for pure-Python packages

Here is a point-by-point explanation of why in my estimation this feature simply would not work for conda-forge.

Resolve dependencies across conda and PyPI packages in a single solve

In a non-trivial fraction of cases, even for pure-Python packages, the requirements in the conda-forge have subtle differences from the upstream requirements. These changes range from package renaming (e.g., - and _ are identical for python packages, but not conda packages, matplotlib vs matplotlib-base, etc.) to more substantial changes that enable broader compatibility (e.g., upstream uses exact pins for dependencies, but conda-forge relaxes them because they are clearly not needed). I can imagine that when it comes to pure-Python packages that depend on a compiled backend package, things will also be funky in some cases.

These requirement differences will likely lead to some funky solves that either the conda or conda-forge developers will hear about.

Maintain environment consistency and reproducibility

Environment consistency is a tricky concept IMHO. For sure with this CEP one can in some cases create an environment where all constraints are satisfied. However, if the repodata is wrong, due to the issues outlined above, the formal consistency of the requirements doesn't really matter.

Reproducibility is an even trickier concept. For an environment to be reproducible, one needs to have the same solver with that solver run under the same conditions. Let's assume the conda version is fixed and the solver command is run on the same machine. Even then the constraint on having the same conditions combined with this CEP in effect means that both the upstream wheel metadata and the conda channel metadata have to be the same. Given that the most likely source of wheels is pypi, there is no way one can promise those same conditions.

Even if we restrict ourselves to environments built from lock files, the combination of the conda channel with pypi as a source of wheels will also not always be reproducible. PyPI users can delete packages (as opposed to simply yanking packages) and those deletions will break even locked envs. We do not allow package deletions on conda-forge for this exact reason. Thus for conda-forge, we could not recommend using this feature for reproducible envs from lock files unless PyPI turns off the ability for users to delete files.

Eliminate the cognitive burden of managing two package managers

The vast majority of the cognative burden is the differing and interacting repodata, not whether or not one types pip install or conda install. This CEP doesn't and cannot address that issue. Even consistently interpreting the repodata in a single solver doesn't address this issue.

Fill gaps in conda package availability without requiring new conda builds
Reduce the maintenance burden by fully or partially eliminating the need to create and maintain conda recipes for pure-Python packages

For the reasons stated above, I am personally skeptical injecting pure-Python wheel metadata into the repodata would consistently result in a correct-enough environments to eliminate the need for new conda builds or the need to repackage pure-python Packages in conda-forge. I am not saying this doesn't work some of the time. Instead I am saying that the solution proposed in this CEP is not so much better than the current "conda, then pip" solve that it can achieve the arguably difficult goals above. Stated another way, in a world where this feature existed instead of the feature to pip-install overtop of conda environments, conda-forge likely would still need/want to repackage everything.

Other comments

When there are naming differences between channels, wheel records MUST use the conda-forge package name as the standard.

This statement is a red-flag for me on this CEP. First, conda-forge itself doesn't have an authoritative mapping of its own packages back to python wheels. There are several approaches in the wild and none of them is standardized into an automated bit of repodata tools like conda/mamba/pixi/rattler can read and interact with. See the discussions of PURLS. Second, treating conda-forge as a special channel specifically in a CEP (as opposed to simply using it as a motivating usecase), is definitely IMHO the antithesis of what a CEP is supposed to be. conda is a set of tools and standards and should not be singling out any one purveyor of conda packages.

!=X.Y.Z → Omit dependency (conda does not support version exclusions; see Limitations section)

I don't follow this comment. Can you clarify? For sure I have used this operator in the run section before (see, e.g., https://github.com/conda-forge/ngmix-feedstock/blob/main/recipe/meta.yaml#L30), At minimum any requirement that is foo!=xyz should be added into the constrains section of the repodata. If it is a the wheel's run section, one should add the package without any constraints in the run section of the repodata as well. Th combination of these two additions should achieve the same effect as "install foo, but not version xyz."

beckermr · 2025-12-25T18:40:56Z

Here is one other point that I think is worth considering.

One way I can imagine conda-forge using this feature is through only injecting items into packages.whl via repodata patching. In other words, instead of opening up everything on e.g., pypi, we instead have a structured process where only specific pure-Python wheels from pypi are added via repodata patching.

This procedure has some advantages for conda-forge that might actually be worth considering more generally. These are

It would let us directly control/patch the final repodata entries so that we can ensure the injections are not breaking or destructive. This procedure solves the main issue IMHO with the CEP here, namely that the repodata itself between pypi and conda-forge is simply not compatible.
conda-forge could itself store a copy of the wheel artifacts so that things are reproducible for lock files (i.e., no deletions of wheels from some external source). Given our current storage options, conda-forge would likely use a tool like conda-press to put the artifacts directly into anaconda.org and/or a mirror as conda artifacts.

One issue I have left unaddressed here is testing new repodata entries before they are added. We'd want to build at least one test environment and insure the package, at minimum, imports before we pushed it out to the world.

On thing I am noticing is that as we add on these additional requirements and desires, it seems almost simpler for conda-forge to use its existing feedstock infrastructure. We'd likely have to build a new "staged-wheels" system to manage this kind of process.

danyeaw · 2025-12-26T15:42:39Z

Hi @beckermr, thanks so much for your time reviewing and responding to this draft, I really appreciate it! I would like to use your feedback to strengthen the draft.

Version exclusions

At minimum any requirement that is foo!=xyz should be added into the constrains section of the repodata.

Thanks so much for pointing this out. My updated understanding is that Requires-Dist: numpy>=1.20.0,!=1.24.0 from a wheel metadata would be translated to:

{
  "name": "pandas",
  "version": "2.0.0",
  "depends": [
    "numpy >=1.20.0"
  ],
  "constrains": [
    "numpy !=1.24.0"
  ]
}

I'll update the CEP to make sure that is clear.

Mapping names centrally to conda-forge

First, conda-forge itself doesn't have an authoritative mapping of its own packages back to python wheels....

Second, treating conda-forge as a special channel specifically in a CEP (as opposed to simply using it as a motivating usecase), is definitely IMHO the antithesis of what a CEP is supposed to be.

Great points. We were probably trying too hard to make the community approach the standard, but as you point out, it isn't currently standardized.

I think having the wheel index own the mapping is still the right approach, what if we have the index declare which channel it is mapping names to. For example, we could add an optional field called name_mapping_channel to the info section like:

{
  "info": {
    "subdir": "noarch",
    "base_url": "https://repo.example.com/channel/",
    "name_mapping_channel": "conda-forge"
  },
  "packages.whl": {
    "requests-2.32.5": { ... }
  }
}

What do you think about this idea?

PyPI can delete packages

PyPI users can delete packages (as opposed to simply yanking packages) and those deletions will break even locked envs. We do not allow package deletions on conda-forge for this exact reason. Thus for conda-forge, we could not recommend using this feature for reproducible envs from lock files unless PyPI turns off the ability for users to delete files.

Another great point that we could address in the CEP!

There has been discussion from the PyPI community over the last year about standardizing around the deletion policy. For example there was a withdrawn PEP 763 and the Discuss Python.org topic about it. The consensus came down to:

In general there is consensus that limiting deletion would be great
Unfortunately, currently PyPI has size quotas, and users currently need the ability to delete files to manage their quotas

This would be a tradeoff of directly using PyPI packages. Users would get access to thousands of packages with no extra hosting requirements, but they would also be subject to how PyPI currently works. Someone using a PyPI wheel repodata would have to decide if that is a good tradeoff for them.

However, the lock file formats (rattler-lock-v6 and conda-lock-v1) already support a hybrid ecosystem with PyPI sections in the lockfiles. If someone wants to use a wheels channel directly from PyPI, it isn't better or worse than we have right now for reproducibility. In fact, channels that mirror/store wheels (as you suggest below) would actually improve reproducibility compared to the current "conda then pip" workflow.

conda-forge could itself store a copy of the wheel artifacts so that things are reproducible for lock files (i.e., no deletions of wheels from some external source).

As you point out, there are workflows where we could make the system more reproducible than we have now. Do you think that the CEP should recommend that production channels mirror wheels to ensure reproducibility, rather than relying directly on PyPI URLs?

Downstream patching ability for conda-forge (and other ecosystems)

In a non-trivial fraction of cases, even for pure-Python packages, the requirements in the conda-forge have subtle differences from the upstream requirements.

I would love to help find the right solution to this! Thanks again for the really valuable perspective. I think there are two complementary approaches we should take:

Push improvements upstream - we should engage with Python projects even more than we do now when we find metadata issues - opening PRs for incorrect dependencies, version constraints, etc. This is the sustainable long-term solution.
Support downstream repodata patching - The CEP should explicitly support patching packages.whl entries, just like conda packages today. As you point out, this is essential for:

Immediate fixes - Can't wait for upstream releases when dependencies are breaking environments
Quality control - Test packages before exposing to users
Ecosystem needs - Name mappings, relaxed constraints for conda compatibility
Reproducibility - Optionally mirror/store wheel artifacts so they can't be deleted

The vast majority of the cognative burden is the differing and interacting repodata, not whether or not one types pip install or conda install. This CEP doesn't and cannot address that issue.

You're absolutely right that there is a burden caused by is conflicting metadata. Repodata patching is how we could address this, channels can correct metadata conflicts so users don't encounter them. However, I also think the current client workflows are also a burden. Giving users seamless access to thousands of packages without requiring them to know whether they're from PyPI or conda channels would solve a huge pain point.

For the reasons stated above, I am personally skeptical injecting pure-Python wheel metadata into the repodata would consistently result in a correct-enough environments to eliminate the need for new conda builds or the need to repackage pure-python Packages in conda-forge.

You're right to be skeptical about fully eliminating the need for feedstocks. This CEP won't replace conda-forge's packaging infrastructure - metadata differences mean many packages will still need proper conda recipes. This is about handling the simpler cases more efficiently. Think of it as an additional tool for easier pure-Python packages, not a replacement for feedstocks. I'll update the CEP to better capture this view.

Implementation plan for a wheel channel

One way I can imagine conda-forge using this feature is through only injecting items into packages.whl via repodata patching. In other words, instead of opening up everything on e.g., pypi, we instead have a structured process where only specific pure-Python wheels from pypi are added via repodata patching.

I am really liking your thoughts on how we could implement this, nice!

I am not sure if the implementation plan should be part of this CEP or not, so I would be grateful for everyone's thoughts on that. However, I really like where you are going with this plan and I also envision some sort of phased approach. We could start with fully manual curation, but then move toward semi-automated as we learn from the manual process. This balances lower barrier (no recipes needed for many packages) with quality control. Complex packages should still use feedstocks, but this handles the simpler pure-Python case more efficiently. It would be amazing to use a conda-forge wheel channel as a test case for this if the community is interested.

Thanks again for all of the extremely valuable input, I'm looking forward to hearing more of your thoughts as we continue to refine this draft.

h-vetinari · 2025-12-27T02:41:35Z

Not commenting on the whole CEP (I share much of @beckermr's reservations, but currently don't have a strong opinion), just one aspect that's important to get right IMO:

I think having the wheel index own the mapping is still the right approach, what if we have the index declare which channel it is mapping names to. For example, we could add an optional field called name_mapping_channel to the info section like:

I think it would be a good idea to consider whether this can build on top of the proposed PEP 804 (discourse), which tries to standardize something useable around the whole name mapping issue.

It'd be certainly better if we can build on top of that (and help support that PEP) rather than inventing yet another scheme.

CC @jaimergp @rgommers @mgorny

Updates include: - Clarifications on naming standards - Channel mapping - Patching capabilities for dependency management # Conflicts: # cep-XXXX.md

danyeaw · 2025-12-30T19:57:16Z

Hi @beckermr, I made the updates to the relevant sections to fix version exclusions, remove reliance on conda-forge for channel mapping, clarify that we need downstream patching ability, added an implementation options section, and added recommendations for protecting against PyPI packages being deleted. Thanks again for all of the great feedback.

Hi @h-vetinari

I think it would be a good idea to consider whether this can build on top of the proposed PEP 804

Thanks that's a great point that we should build on this! I added a call out to PEP-804 in the Naming standard and channel mapping section.

Add CEP for Repodata Wheel Support

c6ac060

Co-authored-by: Travis Hathaway <travis.j.hathaway@gmail.com>

jjhelmus reviewed Dec 23, 2025

View reviewed changes

cep-XXXX.md Show resolved Hide resolved

pavelzw reviewed Dec 24, 2025

View reviewed changes

danyeaw added 2 commits December 23, 2025 22:04

Add conda-whl-channel to the history timeline

9688da6

Grammar fixup

2255fd9