Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

creation functions according to NEP 35 and updated *_like behavior #1669

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

keewis
Copy link
Contributor

@keewis keewis commented Nov 28, 2022

For a more consistent experience with the *_like functions and the functions from NEP 35 (introduced in numpy=1.20, minimum version is 1.19.5 but could be 1.20 according to NEP 29), this changes the semantics to:

  • have full_like return the units of fill_value (a non-quantity fill_value results an array of the type of the magnitude):
full_like(arr, fill_value=q) == Quantity(full_like(arr, fill_value=q.magnitude), q.units)
  • have empty_like use the units of the argument:
empty_like(q) == Quantity(empty_like(q.magnitude), q.units)
  • have ones_like and zero_like a quantity with the units of the argument:
ones_like(q) == full_like(q, fill_value=Quantity(1, q.units))
zeros_like(q) == full_like(q, fill_value=Quantity(0, q.units))
  • support the NEP 35 keyword like for creation functions:
np.array(arr, like=q) == Quantity(np.array(arr), q.units)
np.arange(10, like=q) == Quantity(np.arange(10, like=q.magnitude), q.units)
...

I'm not sure I got all of the functions that were changed by NEP 35, I couldn't find a comprehensive list. The ones I found are:

  • array
  • asarray
  • asanyarray
  • arange
  • ones
  • zeros
  • empty
  • full
  • identity
  • eye

but I'm almost certain I missed some (linspace does not seem to be included in that list, unfortunately).

The definition of all of them except full and arange is pretty simple: just use the units of the like argument.

For full, I chose to go with the unit of the fill_value or dimensionless if the fill value happens not to be a quantity (because by passing like we expect a quantity). For arange, we can go the "units of like" route, but we can also support something like

np.arange(1 * ureg.kg, 5 * ureg.kg, 100 * ureg.g, like=[0] * ureg.kg)

(the like=[0] * ureg.kg is important for dispatching, unfortunately)

I might be missing something, though, so I'd appreciate someone checking this.

cc @jthielen

@jthielen
Copy link
Contributor

The changes here look good to me, and as far as I'm aware, that's the same list of functions that I'm aware of that'd be covered by NEP 35. So I'd say go with it, and if something a user needs happens to have been missed, it could be added later?

Speaking of adding later, though, would you be able to add code comments (and modify the existing ones that begin with # Make full_like by multiplying) clarifying the approaches taken here, so that it is easier for future contributors to these sections of code? For example, the code pattern in https://github.com/hgrecco/pint/pull/1669/files#diff-3902d8460bf42f38121b360a0cf397597567077c08aa6d0a565a65332290395cR953-R967 seems unnatural on first glance, and so could use comments to clarify that first you're doing the "automatic" implementations, then the custom ones. Also, while it is still an unchecked checklist item (at time of writing comment), I think these nuances of array creation could benefit from some direct documentation in the NumPy page of the docs.

@jules-ch
Copy link
Collaborator

Seems like some tests with xarray are making CI fail.
Do we know what's triggering this ?

@keewis
Copy link
Contributor Author

keewis commented Dec 19, 2022

thanks, @jthielen, those are some good points. I've tried to address them in the most recent commits, though I suspect to be really useful we'd need to rewrite / restructure the numpy guide: right now, the implementation details are condensed into a single paragraph, which doesn't make it easy to search for a specific type of function, and we're also missing clear examples. Instead, I suggest having one section per function category. If nobody objects, I'll try to do that in a new PR (not sure when I get to that, though).

@jules-ch: no idea. Might be because of changes in this PR, I'll have to investigate.

@keewis
Copy link
Contributor Author

keewis commented Dec 20, 2022

Seems like some tests with xarray are making CI fail. Do we know what's triggering this ?

It seems xarray is using zeros_like in isnull to construct a all-False array for dtypes that are not nullable, and because we changed zeros_like to return a quantity this has stopped working.

I'd argue that this is implicit behavior in xarray (False just so happens to be represented as 0), and that it would be much better to use full_like in this case:

zeros_like(data, dtype=bool)  # old
full_like(data, dtype=bool, fill_value=False)  # new

However, even though we can fix this xarray very quickly we'd basically have to require the newest version of xarray for use with a new release of pint. So I'm not sure how to best handle this: should we update xarray, and wait 6 months before merging this PR? Or is it fine to just merge and require people to update xarray?

@keewis
Copy link
Contributor Author

keewis commented Dec 20, 2022

another option would be to avoid wrapping bool dtypes, but that might make the code much more complicated

This is important because `dask` chose not to implement `subok`.
@keewis
Copy link
Contributor Author

keewis commented Dec 21, 2022

this has become tricky... see pydata/xarray#7395 for more details.

The most recent commit fixes a bug in full_like when called on a pint quantity wrapping a dask array (dask chose not to implement subok), which prevents xarray from merging my PR and releasing a new version compatible with the new behavior of *_like. However, we need that version to be able to change the behavior of *_like without breaking xarray's implementation of isnull on non-nullable dtypes (all except object, float, and complex).

To get out of this dead-lock, I think we need to somehow introduce a way to switch between old and new behavior of *_like (a new registry option?). This would allow us to properly deprecate the old behavior: the next version would ship with deprecation warnings for the old behavior of *_like, the version after that would switch the behavior but allow switching back to the old behavior, and some versions later we'd remove the switch and the old code. In general, though, I think we can keep the deprecation cycle short since we warned about the behavior of *_like being experimental.

(and of course, if someone has a better idea I'd be open to that as well)

@andrewgsavage
Copy link
Collaborator

t he next version would ship with deprecation warnings for the old behavior of *_like

Do you want to add a depreciation warning in another PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants