New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add movingaverage function. #13923
Conversation
and np.sum functions which operate on arrays. Also has the 'axis' kwarg option.
Tried looking into the details of failed checks. Not sure why 'shippable' failed or what that means. Any help with the error checks is appreciated, along with any help regarding syntactic standards in keeping with the repo standards. Thanks. |
Isn't this the same as a convolution with [1, 1, 1] / 3 (with some decisions needed about offsets? In general, enhancements and new APIs need to hit the mailing list for discussion. We are very conservative about adding new functionality. After there is consensus, this needs tests for success and for failing edge cases. |
Not quite, because np.convolve only accepts 1D arrays. So in order to get np.convolve to behave in this way, you need to use a list comprehension. And if you then want axis=x, you need to then turn it into an array, and do the swapaxes, and then convolve for each i (with one comprehension at each depth) and then swapaxes again. At that point, you might as well just copy this solution and implement it yourself.
100% understand. Thought I would write the code. The magic line with the np.sum in it is non-obvious. Furthermore there seems to be a lot of people trying to figure out moving averages in numpy/pandas from my googling, everyone has a different solution, and I think I'm the only one with one successful to an arbitrary depth. I also considered publishing this as a single-def package but I want to avoid npm-style implosions.
So the way that this algorithm works is that, assuming you have That said thanks for your help and consideration. |
If we're going to add something like this to numpy, I think it should be spelt something like We already have numerous requests for |
In the wolfram language, for instance, It seems a bit clunky to require a kwarg for
Wherein |
Let's not discuss sliding window here. #7753 has some good discussion, and is probably pending a maintainer / mailing list decision. Note that the proposed sliding window does so without making any data copies, making it faster than your approach. |
There's a "16 months faster" joke in there somewhere considering the last code was in March of last year. :P |
Great, then using the solution
Then the code should be
If you want to implement a moving average, I'd say that second block I came up with is non-trivial. I really think it should be it's own function then. |
Timed it, looks like it's about 2-3x faster than my original algorithm for Seriously, though, considering that there is still no movement, should we start a new library or like PR someone else's? Why is this so behind? |
One thing about moving averages is that if you do not mind losing a bit of precision, there are much faster solution out there. The same is true for many other "moving" functions. Which does not mean that we should not add that windowing function, as long as the documentation mentions that maybe. |
Is there a faster algorithm that has the same functionality? What do you mean by a “losing bit of precision”? Binning? |
Well that’s an |
Well I implemented a cumsum lookup algorithm and it looks like it actually wins for very wide (not along the averaging axis) 2D arrays and for very long averaging schemes! Very nice @seberg! It's also more understandable than the sliding window. Try it yourself: Looks like we have a new contender for fastest algorithm, @eric-wieser .
|
For some reason my cumsum_mvgavg doesn't work beyond axis depth 2. Weird. I'll have to figure that out. Edit: fixed. So it turns out it's a bit more complicated, sometimes cumsum wins, sometimes reindex wins. Cumsum seems to do well when you have a very wide 2D array and a long averaging length. Otherwise, reindex seems to win. But when cumsum wins it wins by a lot. Here's some array shapes, avg length, and times for cumsum and reindex:
using
|
@NGeorgescu before you spend even more time on this, I want to make sure that you understand that new functions will have to be passed by the mailing list and that we may have different long term goals, etc. Saying this, I hope that even if this ends up being closed, you are happy to have learned and optimized your own code. |
Well I mean how long is this email list process? Weeks? Months? ... looks like years from the other PR. I made a single-function library out of it. If something moves let me know. I updated the PR with the updated code, which includes the fastest algorithm, the sliding window for weighting, as well as a binning function which is orders of magnitude faster but shortens the data. See you at python-moving-averages. |
Sorry, forgot about it, the email process could be quick, if there is a consensus about the name and if it should go in. A library for moving window functions would be a nice solution also. Thank you for starting with an external small package, can help others without the trickier decisions. Maybe you can just try and send an email to the mailing list assuming you want to push it a bit. |
@NGeorgescu not sure what is going on there. But we moved the mailing list to be hosted at python, so I think you probably got the wrong place and likely have to sign up for |
Oh I got to that through the Read/Search numpy-discussion on this page. How do I post to numpy-discussions? |
Seems that is still up to date, although I am not quite sure you can interact through it, maybe it will show up in a bit. (I once had my emails eaten by python.org, because of a two sided thing. My mailserver did something slightly funny, and python.org did not quite adhere to the standard) |
Hey @seberg , it seems that my posts are being eaten by the system. Would you mind posting the comment? It seems you've had successful recent posts there.
Thanks. |
This PR is unfortunately not going to work. There seems to be a consensus in gh-7753 that this functionality should not be in the main numpy namespace; either we put it in I'd probably trim this PR down a lot and try to get |
Well |
@NGeorgescu I updated the slidingwindow about a month ago according to reviewers suggestions. |
If you want to move this PR forward, I suggest to put it on top of the |
I did that. |
okay, I'll be more explicit in pointing out what needs to go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, a very quick first pass over the code - there's quite a bit to change. Also, this needs tests.
new_strides = np.concatenate((strides, strides), axis=0) | ||
return np.lib.stride_tricks.as_strided(arr ,new_shape, new_strides) | ||
|
||
def pascal(n): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't look like something we'd want to include
"""returns the nth line of a pascal's triangle""" | ||
return np.array([1]) if n == 1 else np.array([0,*pascal(n-1)])+np.array([*pascal(n-1),0]) | ||
|
||
def triangle(n): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't look like something we'd want to include
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An arguement for keeping this somewhere - it somewhat fits in with the other window functions - although if we decide to add to that list, I'd be inclined to create an np.window
sub-module, rather than growing the top namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those window functions are considered a mistake, they shouldn't have been added in the first place. Certainly we don't want to add to them. scipy.signal
is a much better place for those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like we need to update that doc page to recommend the scipy versions then (and the larger library of functions alongside them)
def triangle(n): | ||
return np.array([*range(1,int((n+2)/2))]+[*range(int((n+1)/2),0,-1)]) | ||
|
||
def quadratic(n): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't look like something we'd want to include
Width of the moving average | ||
axis: int | ||
Axis along which the moving average is to take place, 0 if unspecified | ||
weights: 'pascal', 'triangle', 'quadratic', or a list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weights
need to be generic, not just a couple of random weighting scheme. probably should take a callable of some sort. since the shape of weights depends on the window, it can't just be array_like
I think (which would otherwise be preferred, as in np.average
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I grabbed the movingaverage weighting scheme from another language. I'll get rid of the random weights. I agree that's not clean.
|
||
|
||
#@array_function_dispatch(_mvgavg_dispatcher) | ||
def mvgavg(a, n, axis=0, weights=False, binning=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs a readable name: moving_average
If you have any doubts of how a moving average works, try it out with an array of | ||
symbols in sympy. | ||
|
||
axis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this and the next one aren't valid section names, please remove the headers
Binning vs Unbinned | ||
-------------------- | ||
Consider the array [a b c d e... ] that is being averaged to the product array, | ||
[A B C D... ] over a distance of 3. If binning = False: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
::
for proper rendering at the end of this line
Examples | ||
-------- | ||
>>> from mvgavg import mvgavg | ||
>>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spurious >>>
, remove
-------- | ||
>>> from mvgavg import mvgavg | ||
>>> | ||
>>> example_array=[[i,(-1)**i] for i in np.arange(5)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just call it x
. and PEP8 spaces please
|
||
""" | ||
if axis == None: | ||
axis = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
axis=None
usually means to flatten the array. if you want the default to be axis=0
, put it in the signature
Thanks! I appreciate you taking the time on this. I'll get to changing. Also, what are the tests and how can they be incorporated? Like what are we testing for? Is it something like |
Have a look at the tests for |
I wonder if we should just go down the road of providing a recipe in the docs, rather than actually exposing this function? Something as simple as: For example, this function can be used to build a moving average
..code:: python
def moving_average(a, n, weights=None, axis=-1):
axis = normalize_axis_index(axis, a.ndim) # needed to make the + below valid
return np.average(sliding_window_view(a, (n,), axis=(axis,)), weights, axis=a.ndim+axis) |
Two reasons:
|
Looks like a controversial PR, @NGeorgescu , you need to put more energy on this if you want to move on, or I will close it for PR management. |
Yeah so I was thinking about this PR I opened. In hindsight the weights was a feature that no one will use (I think), and if you do well I guess you can custom code your own thing. This simplifies the PR immensely and no longer involves the stride tricks so I shouldn't pull that I should just pull whatever normal dev branch (I forget). So what's easiest: should I just change this PR or just open a fresh one? |
It's better to create a new one. |
Cool Will Do! |
Added a moving average function, which is in keeping with np.mean and np.sum functions which operate on arrays. Also has the 'axis' kwarg option. Works for arrays of any size/depth and is intuitive in syntax.