ENH: Add movingaverage function. #13923

NGeorgescu · 2019-07-06T16:59:13Z

Added a moving average function, which is in keeping with np.mean and np.sum functions which operate on arrays. Also has the 'axis' kwarg option. Works for arrays of any size/depth and is intuitive in syntax.

and np.sum functions which operate on arrays. Also has the 'axis' kwarg option.

NGeorgescu · 2019-07-07T17:19:36Z

Tried looking into the details of failed checks. Not sure why 'shippable' failed or what that means. Any help with the error checks is appreciated, along with any help regarding syntactic standards in keeping with the repo standards.

Thanks.

mattip · 2019-07-07T22:00:35Z

Isn't this the same as a convolution with [1, 1, 1] / 3 (with some decisions needed about offsets?

In general, enhancements and new APIs need to hit the mailing list for discussion. We are very conservative about adding new functionality. After there is consensus, this needs tests for success and for failing edge cases.

NGeorgescu · 2019-07-08T01:31:38Z

Isn't this the same as a convolution with [1, 1, 1] / 3 (with some decisions needed about offsets?

Not quite, because np.convolve only accepts 1D arrays. So in order to get np.convolve to behave in this way, you need to use a list comprehension. And if you then want axis=x, you need to then turn it into an array, and do the swapaxes, and then convolve for each i (with one comprehension at each depth) and then swapaxes again. At that point, you might as well just copy this solution and implement it yourself.

In general, enhancements and new APIs need to hit the mailing list for discussion. We are very conservative about adding new functionality.

100% understand. Thought I would write the code. The magic line with the np.sum in it is non-obvious. Furthermore there seems to be a lot of people trying to figure out moving averages in numpy/pandas from my googling, everyone has a different solution, and I think I'm the only one with one successful to an arbitrary depth. I also considered publishing this as a single-def package but I want to avoid npm-style implosions.

After there is consensus, this needs tests for success and for failing edge cases.

So the way that this algorithm works is that, assuming you have [a,b,c,d,e...z] for a moving average of n=3, it creates one array [a,b,c, ... ,w,x], one [b,c,d,...., x,y] and one [c,d,e,.... ,y,z] (3 in total) and then sums them together and divides by n. So the first element is (a+b+c)/3, the second (b+c+d)/3, ... the last is (x, y, z)/3. So this works to any depth since a itself can be an array. The only real edge cases are ragged arrays or any other time np.sum would fail that I can think of. But I'm interested to see what others think.

That said thanks for your help and consideration.

eric-wieser · 2019-07-08T01:34:08Z

If we're going to add something like this to numpy, I think it should be spelt something like np.average(np.sliding_window(x, len=l. axis=a), axis=a).

We already have numerous requests for sliding_window, and just need to pin down its semantics.

NGeorgescu · 2019-07-08T03:11:09Z

If we're going to add something like this to numpy, I think it should be spelt something like np.average(np.sliding_window(x, len=l. axis=a), axis=a).
We already have numerous requests for sliding_window, and just need to pin down its semantics.

In the wolfram language, for instance, MovingAverage[] is it's own separate function (but I numpified it, like np.std(), np.cumsum(), etc.). This seems the most logical because it will be always have a specified axis on the np.average() function, so the average is then a wrapper that requires this clunky kwarg, but I understand not wanting to feature creep, and I agree though sliding_window itself seems like a very useful function (for instance, in determining a maximum value in some moving window, etc.)

It seems a bit clunky to require a kwarg for sliding_window, since it makes no sense without it. It makes sense for like axis=0, for instance, since that's a reasonable default value, but what is the default len if the kwarg isn't included? len=1 and a len the length of the array are both trivial nesting operations. Seems like that's better as an arg than kwarg. I mean, unless there's a compelling counter-example. This works:

def sliding_window(a,n,axis=0):
    a = asarray(a).swapaxes(0,axis)
    a = array([a[m:len(a)-n+1+m] for m in range(n)])
    return a

Wherein sliding_window is a function that takes in an n-dimensional array are returns an (n+1)-dimensional array wherein each member of the axis dimension is a sliding window of the original array along the chosen axis. Then np.average(np.sliding_window(a, n. axis=x), axis=x) will work assuming that both x numbers are the same. (return a.swapaxes(axis,0) in sliding_window if you want the np.average to have a kwarg permanently of axis=0 rather than axis=x).

eric-wieser · 2019-07-08T03:55:52Z

Let's not discuss sliding window here. #7753 has some good discussion, and is probably pending a maintainer / mailing list decision. Note that the proposed sliding window does so without making any data copies, making it faster than your approach.

NGeorgescu · 2019-07-09T02:52:50Z

There's a "16 months faster" joke in there somewhere considering the last code was in March of last year. :P

NGeorgescu · 2019-07-09T03:31:31Z

Great, then using the solution

def sliding_window_view(arr, shape):
    n = np.array(arr.shape) 
    o = n - shape + 1 # output shape
    strides = arr.strides
    
    new_shape = np.concatenate((o, shape), axis=0)
    new_strides = np.concatenate((strides, strides), axis=0)
    return np.lib.stride_tricks.as_strided(arr ,new_shape, new_strides)

Then the code should be

def mvgavg(a, n, axis=0):
    shape = a.shape-(n-1)*np.eye(a.ndim,dtype='int')[axis]
    window = sliding_window_view(a, shape)
    return np.mean(window,axis=axis).reshape(shape)

If you want to implement a moving average, I'd say that second block I came up with is non-trivial. I really think it should be it's own function then.

NGeorgescu · 2019-07-09T03:58:13Z

Timed it, looks like it's about 2-3x faster than my original algorithm for np.random.rand(30000,2000)

Seriously, though, considering that there is still no movement, should we start a new library or like PR someone else's? Why is this so behind?

seberg · 2019-07-09T13:50:08Z

One thing about moving averages is that if you do not mind losing a bit of precision, there are much faster solution out there. The same is true for many other "moving" functions. Which does not mean that we should not add that windowing function, as long as the documentation mentions that maybe.

NGeorgescu · 2019-07-09T14:19:29Z

Is there a faster algorithm that has the same functionality? What do you mean by a “losing bit of precision”? Binning?

seberg · 2019-07-09T14:54:06Z

https://en.wikipedia.org/wiki/Summed-area_table

NGeorgescu · 2019-07-09T15:15:36Z

Well that’s an np.cumsum Difference algorithm. It would be the same precision unless I’m misunderstanding what you’re trying to say. I’d be interested to see if it’s faster than using the reindexing by @fnjn. I know people use it for convolutional neural networks and the such because of its speed but I don’t know if it still applies when taking a 1D moving average. The reason people use it is to avoid repeat operations, but I don’t think that advantage applies here.

NGeorgescu · 2019-07-09T19:00:04Z

Well I implemented a cumsum lookup algorithm and it looks like it actually wins for very wide (not along the averaging axis) 2D arrays and for very long averaging schemes! Very nice @seberg!

It's also more understandable than the sliding window. Try it yourself:

Looks like we have a new contender for fastest algorithm, @eric-wieser .


import numpy as np

def sliding_window_view(arr, shape):
    """
    This function was written by fnjn on github and posted to numpy/numpy.
    It takes in an array and a shape, and then gives back an array that is
    essentially moving around a planchette of the described shape over the
    array.
    """
    n = np.array(arr.shape) 
    o = n - shape + 1 # output shape
    strides = arr.strides
    new_shape = np.concatenate((o, shape), axis=0)
    new_strides = np.concatenate((strides, strides), axis=0)
    return np.lib.stride_tricks.as_strided(arr ,new_shape, new_strides)

def mvgavg(a, n, axis=0, weights=None):
    a = np.array(a)
    shape = a.shape-(n-1)*np.eye(a.ndim,dtype='int')[axis]
    window = sliding_window_view(a, shape)
    return np.mean(window,axis=axis).reshape(shape)

def cumsum_mvgavg(a, n, axis=0):
    table = np.cumsum(a.swapaxes(0,axis)
    ,axis=0)/(n)
    try:
        table = np.vstack([[np.zeros(len(table[0]))],table])
    except:
        table = np.array([0,*table])
    return np.swapaxes(table[n:]-table[:len(table)-n],axis,0)


arr = np.random.rand(3000,10)
import time
times_cumsum=[]
times_reindex=[]
for i in range(50):
    t_cumsum = time.time()
    ca = cumsum_mvgavg(arr,50,axis=0)
    t_cumsum = time.time()-t_cumsum

    times_cumsum.append(t_cumsum)
    
    t_reindex= time.time()
    ra = mvgavg(arr,50,axis=0)
    t_reindex = time.time()-t_reindex
    
    times_reindex.append(t_reindex)
    assert np.mean((ca-ra)**2)<10**-6

times_cumsum = np.mean(times_cumsum[2:-2])
times_reindex = np.mean(times_reindex[2:-2])
print([times_cumsum,times_reindex])

NGeorgescu · 2019-07-09T19:18:30Z

For some reason my cumsum_mvgavg doesn't work beyond axis depth 2. Weird. I'll have to figure that out. Edit: fixed.

So it turns out it's a bit more complicated, sometimes cumsum wins, sometimes reindex wins. Cumsum seems to do well when you have a very wide 2D array and a long averaging length. Otherwise, reindex seems to win. But when cumsum wins it wins by a lot. Here's some array shapes, avg length, and times for cumsum and reindex:

(3000,) 50 [0.0006575 0.0002985]
(3000,) 5 [0.000719  0.0001683]
(3000, 10) 50 [0.0004457 0.0016589]
(3000, 10) 5 [0.0005152 0.0005925]
(3000, 10, 20) 50 [0.0110596 0.1357119]
(3000, 10, 20) 5 [0.0116897 0.0194189]

using


def cumsum_mvgavg(a, n, axis=0):
    table = np.cumsum(a.swapaxes(0,axis),axis=0)/n
    try:
        table = np.vstack([[0*table[0]],table])
    except:
        table = np.array([0,*table])
    return np.swapaxes(table[n:]-table[:-n],0,axis)

seberg · 2019-07-12T16:36:24Z

@NGeorgescu before you spend even more time on this, I want to make sure that you understand that new functions will have to be passed by the mailing list and that we may have different long term goals, etc. Saying this, I hope that even if this ends up being closed, you are happy to have learned and optimized your own code.

NGeorgescu · 2019-07-13T01:33:24Z

Well I mean how long is this email list process? Weeks? Months? ... looks like years from the other PR.

I made a single-function library out of it. If something moves let me know. I updated the PR with the updated code, which includes the fastest algorithm, the sliding window for weighting, as well as a binning function which is orders of magnitude faster but shortens the data. See you at python-moving-averages.

seberg · 2019-07-23T22:41:19Z

Sorry, forgot about it, the email process could be quick, if there is a consensus about the name and if it should go in. A library for moving window functions would be a nice solution also. Thank you for starting with an external small package, can help others without the trickier decisions.

Maybe you can just try and send an email to the mailing list assuming you want to push it a bit.

NGeorgescu · 2019-08-01T20:50:10Z

Hey @seberg I put in a topic at the nabble boards. It hasn't shown up yet (not sure what the approval process is like).

seberg · 2019-08-01T20:52:54Z

@NGeorgescu not sure what is going on there. But we moved the mailing list to be hosted at python, so I think you probably got the wrong place and likely have to sign up for numpy-discussions@python.org.

NGeorgescu · 2019-08-01T20:56:57Z

Oh I got to that through the Read/Search numpy-discussion on this page. How do I post to numpy-discussions?

seberg · 2019-08-01T21:01:57Z

Seems that is still up to date, although I am not quite sure you can interact through it, maybe it will show up in a bit. (I once had my emails eaten by python.org, because of a two sided thing. My mailserver did something slightly funny, and python.org did not quite adhere to the standard)

NGeorgescu · 2019-08-06T16:20:21Z

Hey @seberg , it seems that my posts are being eaten by the system. Would you mind posting the comment? It seems you've had successful recent posts there.

ENH: Add movingaverage function.

I opened a Pull Request #13923 to
include this package https://pypi.org/project/mvgavg/ in numpy, along
with the associated sliding window function in this PR
#7753.

Thanks.

rgommers · 2019-08-25T23:04:11Z

This PR is unfortunately not going to work. There seems to be a consensus in gh-7753 that this functionality should not be in the main numpy namespace; either we put it in numpy.lib.stride_tricks or leave it out completely (given that scikit-image has this already).

I'd probably trim this PR down a lot and try to get sliding_window in first. There's way too many functions in here to start reviewing in detail.

NGeorgescu · 2019-08-26T00:15:59Z

Well sliding window hasn't been moved on in months. Should I put that on the mailing list first or how should I proceed?

fanjin-z · 2019-08-26T00:25:57Z

@NGeorgescu I updated the slidingwindow about a month ago according to reviewers suggestions.
No afterwards feedback since then.

rgommers · 2019-08-26T01:51:03Z

If you want to move this PR forward, I suggest to put it on top of the sliding_window one, and remove everything except the function you want to add plus tests for it. There's lots of code in here that's not needed.

NGeorgescu · 2019-08-26T03:43:09Z

I did that. pip install mvgavg

rgommers · 2019-08-26T03:47:44Z

okay, I'll be more explicit in pointing out what needs to go

rgommers

Okay, a very quick first pass over the code - there's quite a bit to change. Also, this needs tests.

rgommers · 2019-08-26T03:48:07Z

numpy/core/mvgavg.py

+    new_strides = np.concatenate((strides, strides), axis=0)
+    return np.lib.stride_tricks.as_strided(arr ,new_shape, new_strides)
+
+def pascal(n):


this doesn't look like something we'd want to include

rgommers · 2019-08-26T03:48:12Z

numpy/core/mvgavg.py

+    """returns the nth line of a pascal's triangle"""
+    return np.array([1]) if n == 1 else np.array([0,*pascal(n-1)])+np.array([*pascal(n-1),0])
+
+def triangle(n):


this doesn't look like something we'd want to include

An arguement for keeping this somewhere - it somewhat fits in with the other window functions - although if we decide to add to that list, I'd be inclined to create an np.window sub-module, rather than growing the top namespace.

Those window functions are considered a mistake, they shouldn't have been added in the first place. Certainly we don't want to add to them. scipy.signal is a much better place for those.

Sounds like we need to update that doc page to recommend the scipy versions then (and the larger library of functions alongside them)

rgommers · 2019-08-26T03:48:15Z

numpy/core/mvgavg.py

+def triangle(n):
+    return np.array([*range(1,int((n+2)/2))]+[*range(int((n+1)/2),0,-1)])
+
+def quadratic(n):


this doesn't look like something we'd want to include

rgommers · 2019-08-26T03:51:00Z

numpy/core/numeric.py

+        Width of the moving average
+    axis: int
+        Axis along which the moving average is to take place, 0 if unspecified
+    weights: 'pascal', 'triangle', 'quadratic', or a list


weights need to be generic, not just a couple of random weighting scheme. probably should take a callable of some sort. since the shape of weights depends on the window, it can't just be array_like I think (which would otherwise be preferred, as in np.average).

Yeah I grabbed the movingaverage weighting scheme from another language. I'll get rid of the random weights. I agree that's not clean.

rgommers · 2019-08-26T03:51:26Z

numpy/core/numeric.py

+
+
+#@array_function_dispatch(_mvgavg_dispatcher)
+def mvgavg(a, n, axis=0, weights=False, binning=False):


needs a readable name: moving_average

rgommers · 2019-08-26T03:59:20Z

numpy/core/numeric.py

+    If you have any doubts of how a moving average works, try it out with an array of
+    symbols in sympy.
+
+    axis


this and the next one aren't valid section names, please remove the headers

rgommers · 2019-08-26T03:59:42Z

numpy/core/numeric.py

+    Binning vs Unbinned
+    --------------------
+    Consider the array [a b c d e... ] that is being averaged to the product array,
+    [A B C D... ] over a distance of 3. If binning = False:


:: for proper rendering at the end of this line

rgommers · 2019-08-26T04:00:04Z

numpy/core/numeric.py

+    Examples
+    --------
+    >>> from mvgavg import mvgavg
+    >>> 


spurious >>>, remove

rgommers · 2019-08-26T04:00:24Z

numpy/core/numeric.py

+    --------
+    >>> from mvgavg import mvgavg
+    >>> 
+    >>> example_array=[[i,(-1)**i] for i in np.arange(5)]


just call it x. and PEP8 spaces please

rgommers · 2019-08-26T04:01:24Z

numpy/core/numeric.py

+
+    """
+    if axis == None:
+        axis = 0


axis=None usually means to flatten the array. if you want the default to be axis=0, put it in the signature

NGeorgescu · 2019-08-26T04:31:31Z

Thanks! I appreciate you taking the time on this. I'll get to changing. Also, what are the tests and how can they be incorporated? Like what are we testing for? Is it something like moving_average((-1)**np.arange(100),2) should yield the same answer as 0.*np.arange(99) or was there something more specific?

rgommers · 2019-08-26T04:51:30Z

what are the tests and how can they be incorporated? Like what are we testing for?

Have a look at the tests for average for comparison: https://github.com/numpy/numpy/blob/master/numpy/lib/tests/test_function_base.py#L282

eric-wieser · 2019-08-26T05:21:11Z

I wonder if we should just go down the road of providing a recipe in the docs, rather than actually exposing this function?

Something as simple as:

For example, this function can be used to build a moving average

..code:: python

    def moving_average(a, n, weights=None, axis=-1):
        axis = normalize_axis_index(axis, a.ndim)  # needed to make the + below valid
        return np.average(sliding_window_view(a, (n,), axis=(axis,)), weights, axis=a.ndim+axis)

NGeorgescu · 2019-08-30T16:26:57Z

Two reasons:

The unweighted moving average using a cumsum operation is generally faster, esp. for large n, large numbers of axes, and for arrays that are very wide with respect to the averaging axis.
Most other languages have a single-function call, not a multi-line function, e.g. rollmean in R, MovingAverage in Wolfram, etc.

Qiyu8 · 2020-05-29T07:18:38Z

Looks like a controversial PR, @NGeorgescu , you need to put more energy on this if you want to move on, or I will close it for PR management.

NGeorgescu · 2020-06-01T06:30:45Z

Yeah so I was thinking about this PR I opened. In hindsight the weights was a feature that no one will use (I think), and if you do well I guess you can custom code your own thing. This simplifies the PR immensely and no longer involves the stride tricks so I shouldn't pull that I should just pull whatever normal dev branch (I forget).

So what's easiest: should I just change this PR or just open a fresh one?

Qiyu8 · 2020-06-01T07:20:31Z

It's better to create a new one.

NGeorgescu · 2020-06-01T14:58:07Z

Cool Will Do!

NGeorgescu added 3 commits July 6, 2019 12:32

added moving average function

44d00ac

Added a moving average function, which is in keeping with np.mean

ac18e19

and np.sum functions which operate on arrays. Also has the 'axis' kwarg option.

Used default axis kwarg rather than checking **kwargs

8f81bfa

mattip added 01 - Enhancement 23 - Wish List labels Jul 7, 2019

charris changed the title ~~Movingaverage~~ ENH: Add movingaverage function. Jul 8, 2019

charris added the component: numpy._core label Jul 8, 2019

Optimized Moving Average, added binning, improved docstring

9030a16

rgommers requested changes Aug 26, 2019

View reviewed changes

Qiyu8 added the 61 - Stale label May 29, 2020

NGeorgescu closed this Jun 1, 2020



		#@array_function_dispatch(_mvgavg_dispatcher)
		def mvgavg(a, n, axis=0, weights=False, binning=False):

ENH: Add movingaverage function. #13923

ENH: Add movingaverage function. #13923

Conversation

NGeorgescu commented Jul 6, 2019 • edited

NGeorgescu commented Jul 7, 2019

mattip commented Jul 7, 2019

NGeorgescu commented Jul 8, 2019 • edited

eric-wieser commented Jul 8, 2019 • edited

NGeorgescu commented Jul 8, 2019 • edited

eric-wieser commented Jul 8, 2019

NGeorgescu commented Jul 9, 2019 • edited

NGeorgescu commented Jul 9, 2019 • edited

NGeorgescu commented Jul 9, 2019

seberg commented Jul 9, 2019

NGeorgescu commented Jul 9, 2019

seberg commented Jul 9, 2019

NGeorgescu commented Jul 9, 2019 • edited

NGeorgescu commented Jul 9, 2019 • edited

NGeorgescu commented Jul 9, 2019 • edited

seberg commented Jul 12, 2019

NGeorgescu commented Jul 13, 2019

seberg commented Jul 23, 2019

NGeorgescu commented Aug 1, 2019

seberg commented Aug 1, 2019

NGeorgescu commented Aug 1, 2019 • edited

seberg commented Aug 1, 2019

NGeorgescu commented Aug 6, 2019 • edited

rgommers commented Aug 25, 2019

NGeorgescu commented Aug 26, 2019

fanjin-z commented Aug 26, 2019 • edited

rgommers commented Aug 26, 2019

NGeorgescu commented Aug 26, 2019

rgommers commented Aug 26, 2019

rgommers left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-wieser Aug 26, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NGeorgescu commented Aug 26, 2019

rgommers commented Aug 26, 2019

eric-wieser commented Aug 26, 2019

NGeorgescu commented Aug 30, 2019

Qiyu8 commented May 29, 2020

NGeorgescu commented Jun 1, 2020

Qiyu8 commented Jun 1, 2020

NGeorgescu commented Jun 1, 2020

NGeorgescu commented Jul 6, 2019 •

edited

NGeorgescu commented Jul 8, 2019 •

edited

eric-wieser commented Jul 8, 2019 •

edited

NGeorgescu commented Jul 8, 2019 •

edited

NGeorgescu commented Jul 9, 2019 •

edited

NGeorgescu commented Jul 9, 2019 •

edited

NGeorgescu commented Jul 9, 2019 •

edited

NGeorgescu commented Jul 9, 2019 •

edited

NGeorgescu commented Jul 9, 2019 •

edited

NGeorgescu commented Aug 1, 2019 •

edited

NGeorgescu commented Aug 6, 2019 •

edited

fanjin-z commented Aug 26, 2019 •

edited

eric-wieser Aug 26, 2019 •

edited