Suggestion: Sliding Window Function #7753

nils-werner · 2016-06-17T11:29:52Z

Using np.lib.stride_tricks.as_stride one can very efficiently create a sliding window that segments an array as a preprocessing step for vectorized applications. For example a moving average of a window length 3, stepsize 1:

a = numpy.arange(10)
a_strided = numpy.lib.stride_tricks.as_strided(
    a, shape=(8, 3), strides=(8, 8)
)
print numpy.mean(a_strided, axis=1)

This is very performant but very hard to do, as the shape and strides parameters are very hard to understand.

I suggest the implementation of a "simple sliding_window function" that does the job of figuring out those two parameters for you.

The implementation I have been using for years allows the above to be replaced by

a = numpy.arange(10)
a_strided = sliding_window(a, size=3, stepsize=1)
print numpy.mean(a_strided, axis=1)

@teoliphant also has an implementation that would change the above to

a = numpy.arange(10)
a_strided = array_for_sliding_window(a, 3)
print numpy.mean(a_strided, axis=1)

both making it much more readable.

Seeing it is a common usecase in vectorized computing I suggest we put a similar function into NumPy itself.

Regarding to which implementation to follow, they are both assume different things but allow you to do the same thing eventually:

sliding_window

slides over one axis only
allows setting windowsize and stepsize
returns array with dimension n+1
sliding over several axes requires two calls (which come for free as there is no memory reordered)
has a superfluous copy parameter that can be removed and replaced by appending .copy() after the call

array_for_sliding_window

slides over all axes simultaneously, window lengths are given as tuple parameter
Assumes a stepsize one in all directions
returns array with dimension n*2
stepsize not equal to one requires slicing of output data (unsure if this implies copying data)
Disabling sliding over axis[n] requires you set argument wshape[n] = 1 or wshape[n] = a.shape[n]

This means for flexible stepsize the following are equivalent (with some minor bug in sliding_window there):

a = numpy.arange(10)
print sliding_window(a, size=3, stepsize=2)

a = numpy.arange(10)
print array_for_sliding_window(a, 3)[::2, :] # Stepsize 2 by dropping every 2nd row

for sliding over one axis the following are equivalent (with some transposing and squeezing):

a = numpy.arange(25).reshape(5, 5)
print sliding_window(a, size=3, axis=1)

a = numpy.arange(25).reshape(5, 5)
print array_for_sliding_window(a, (1, 3))

and for sliding over two axis the following are equivalent:

a = numpy.arange(25).reshape(5, 5)
print sliding_window(sliding_window(a, size=3, axis=0), size=2, axis=1)

a = numpy.arange(25).reshape(5, 5)
print array_for_sliding_window(a, (3, 2))

This issue is about sparking discussion about

Do we need such a function?
Which features are required?
Which interface should we persue?

After discussion I am willing to draft up a pull request with an implementation we agreed on.

The text was updated successfully, but these errors were encountered:

jaimefrio · 2016-06-17T12:18:57Z

While getting a sliding window with stride tricks is very cool,
implementing almost any function by vectorizing on top of that is
inefficient. I'm not sure it is a good idea to provide a function that
encourages people to go that route, when there is functionality in pandas,
bottleneck and scipy.ndimage that implements the more typical use cases
efficiently.

mhvk · 2016-06-17T16:18:13Z

I'd agree that there is no real need -- e.g., the example is more efficiently done with convolve -- combined with a real risk of leading to code that is difficult to parse and debug.

nils-werner · 2016-06-21T11:15:50Z

The example was obviously only chosen to keep the example as small as possible.

I am not sure I understand why "almost any function on top of that would it be inefficient".

Another example, calculating the spectrogram of a signal, where the sliding window is >2x faster than a for loop (plus the sliding window implementation being easier to read and debug):

import time
import numpy
import scipy.signal
import matplotlib.pyplot as plt

winlen = 256
stepsize = winlen // 2
siglen = 44100 * 10

############# Using sliding window ############

start = time.clock()
a = numpy.sin(numpy.arange(siglen) * 2 * numpy.pi * 16000 / 44100.0)
A = sliding_window(a, size=winlen, stepsize=stepsize)

B = numpy.fft.fft(A * scipy.signal.windows.hamming(winlen), axis=1)

print time.clock() - start  # 0.037622 on my machine

############# Using for loop ############

start = time.clock()
a = numpy.sin(numpy.arange(siglen) * 2 * numpy.pi * 16000 / 44100.0)

idxlist = range(0, len(a)-winlen, stepsize)
C = numpy.zeros((len(idxlist), winlen), dtype=complex)

for o, i in enumerate(idxlist):
    C[o, :] = numpy.fft.fft(a[i:i+winlen] * scipy.signal.windows.hamming(winlen))

print time.clock() - start  # 0.080116 on my machine

##########################################

print numpy.allclose(B, C)  # True
plt.imshow(numpy.abs(B))
plt.figure()
plt.imshow(numpy.abs(C))
plt.show()

jaimefrio · 2016-06-21T12:53:34Z

It's not the same as your example, but say you have an array with n items
and you want to perform FFTs of size m on a sliding window over it. The
complexity of your operations is going to be O(n m log m). But, once you
compute the first FFT, you don't need to redo all the calculations. Say x
is your original array, and y the FFT for a certain sliding window, and z
for the sliding window a step of size one further. Aside from the multiple
off by one errors I am probably making, you could do something like:

y[k] = sum(x[i] \* exp(-2 \* pi *1j \* k \* i / m for i in range(0, m))
z[k] = sum(x[i] \* exp(-2 \* pi *1j \* k \* (i - 1) / m for i in range(1, m+1))

z[k] = (y[k] - x[0] + x[m] \* exp(-2\* pi \* 1j \* k)) \* exp(2 \* pi \* 1j \* k / m)

and all of a sudden you are computing each new FFT in time O(m) instead of O(m
log m), which means that the overall complexity of your calculation is now O(m
(n + log m)), which is going to be a factor log m faster than the naive
solution.

Your half overlapping FFTs are actually the last step of the FFT algorithm,
were FFTs of size n are composed from 2 FFTs of size n / 2, so that would
be an efficient algorithm for your problem.

For many problems on a sliding window there are similarly clever approaches
that make the calculations very substantially faster. And the fear is that,
by promoting the use of a sliding window approach, we would be leading
users down the path of an easy 2x speedup, rather than the specialized 10x
or 100x speedup.

nils-werner · 2016-06-21T13:33:55Z

Yes, you may be able to optimize an overlapping FFT that way. Again, I am not talking about a specific usecase here.

It pains me to say it but I am looking at things from the "rapid prototyping scientific code" side of things. I have yet to come across scientific code that actually does such optimizations and doesn't just go down the easy "slice and vectorize" road.

When I am trying out some random idea I had I am not interested in prematurely optimizing my lapped operations, I just want it to be reasonably fast and maintainable for as little cost as possible.

A for loop gives me neither: The code is hard to read and it will be slow.
Your solution gives me one only: It may be screaming fast but is near impossible to rapidly iterate on or maintain by non-CS-engineers or let alone students.
A sliding window implementation removes most of the for loops and off-by-one errors and at the same time may provide a reasonable speedup.

When it goes to actual production code with more maintainers and tests and some inner loops done in C your implementation of course makes more sense.

toddrjen · 2016-08-10T13:33:49Z

Considering that both scipy and matplotlib implement their spectrogram-related functions using exactly this approach rather than some more efficient approach, it seems we have already gone down this path.

I think there are two issues with using more efficient approaches. First, it requires different approaches for any calculation you might want to do. Second, it requires someone to actually implement all of these special cases. Considering that even in the supposedly obvious case with FFT no one has stepped up to do this, it doesn't seem that this is happening.

I think the simplest approach would not be so much to create special functions for each calculation you might want, but rather create a single function that takes the window length, overlap, a possible window, and a function. It would then use strides to create the correct shape, then apply the window (if any), then apply the function. matplotlib and scipy are already doing the first two steps, so it would be easy to translate their code into a general function that could work with any function that takes a vector.

To be completely open, I implemented the strided approach in matplotlib, but before that it was using the same algorithm implemented using a loop, so this was an improvement over what existed before.

ghost · 2016-09-05T14:48:29Z

This functionality is also replicated in scikit-image (view_as_windows) and scikit-learn (extract_patches).

Also, if a new convenience function would live under numpy.lib.stride_tricks I don't think users are more encouraged to use a such an approach than they already are.

eric-wieser · 2018-03-16T01:47:10Z

I'm not sure it is a good idea to provide a function that encourages people to go that route

Hiding it in stride_tricks and sticking a note in the docs explaining that for some cases there are more efficient approaches seems to be a good enough solution to that to me.

I'm +1 on adding such a function

eric-wieser · 2018-03-16T01:53:06Z

A possible api extending the array_for_sliding_window suggestion with an axis argument:

res = np.lib.stride_tricks.sliding_window_view(arr, shape, axis=None)

with semantics:

axis is None implies axes = tuple(range(arr.ndim))
Integer axis and shape are promoted to one-element tuples
len(axis) == len(shape)
res.shape == shape + arr.shape (arguably more useful for broadcasting than (arr.shape + shape))
We could perhaps resolve this by having shape be passed as (..., 1, 2, 3) or (1, 2, 3, ...), and have Ellipsis replaced with arr.shape, but there's no precedent for this yet. (this forgot that the size shrinks the larger the window)
res.shape = (...) + shape, so that it broadcasts against arrays with shape shape

The matlab example in #10754 would become:

a = np.linspace(0,1,16).reshape(4, 4)
b = np.lib.stride_tricks.sliding_window_view(arr, (2,2))
new_a = b.mean(axis=(-2,-1))

~~We could consider adding a strides / step argument later, but I think we should leave it out of the initial proposal~~ - see below for why I think this is a bad idea.

lijunzh · 2018-03-17T01:37:32Z

+1 for this feature. It is quite useful in rapid prototyping. I have to copy&paste the moving window function frequently. It will be great if it is an std method in Numpy.

nils-werner · 2018-03-18T09:01:40Z

scikit-image does this for any dimensions in skimage.util.view_as_windows().

lijunzh · 2018-03-18T10:31:23Z

@nils-werner Thanks for the info. That's a handy function to have.

But, it will still be better to have something similar in Numpy to avoid additional dependency of the skimage package. After all, I don't think moving window function is a special need for image processing, but a common practice in many signal/data processing which Numpy/Scipy package is targeted at.

fanjin-z · 2018-03-19T19:37:33Z

@eric-wieser , @lijunzh
Here is my proposal:

def sliding_window_view(arr, shape):
    n = np.array(arr.shape) 
    o = n - shape + 1 # output shape
    strides = arr.strides
    
    new_shape = np.concatenate((o, shape), axis=0)
    new_strides = np.concatenate((strides, strides), axis=0)
    return np.lib.stride_tricks.as_strided(arr ,new_shape, new_strides)

Test case1

arr = np.arange(12).reshape(3,4)
shape = np.array([2,2])
sliding_window_view(arr, (2,2))

input:

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

output:

array([[[[ 0,  1],
         [ 4,  5]],

        [[ 1,  2],
         [ 5,  6]],

        [[ 2,  3],
         [ 6,  7]]],


       [[[ 4,  5],
         [ 8,  9]],

        [[ 5,  6],
         [ 9, 10]],

        [[ 6,  7],
         [10, 11]]]])

###Test case 2

arr = np.arange(9).reshape(3,3)
shape = np.array([1,2])
sliding_window_view(arr, shape)

input:

[[0 1 2]
 [3 4 5]
 [6 7 8]]

output:

array([[[[0, 1]],
        [[1, 2]]],

       [[[3, 4]],
        [[4, 5]]],

       [[[6, 7]],
        [[7, 8]]]])

shoyer · 2018-03-21T04:55:03Z

A possible api:
res = np.lib.stride_tricks.sliding_window_view(arr, shape, axis=None)

I like this API, but I would suggest that shape should be renamed size, for consistency with the singular name axis.

eric-wieser · 2018-03-21T05:00:43Z

Hmm, I see where you're coming from there. There seems to be a lot of precedent for axis (vs axes) and shape (vs size), but the singular / plural combination is a little weird.

Are there any existing function that take both arguments already?

eric-wieser · 2018-03-21T05:17:52Z

Regarding step or stride - I don't think it's necessary - sliding_window(x, shape, step) can be spelt sliding_window(x, shape)[::step], with a tiny O(ndim) overhead

Furthermore, the step argument is ambiguous - does it specify the step within a window, or the step between windows? Again, slicing can express these clearly:

within a window: sliding_window(x_1d, N)[:, ::step]
between windows: sliding_window(x_1d, N)[::step, :]

Furthermore, slicing lets an offset be specified, whereas a step argument would not

fanjin-z · 2018-05-12T03:38:45Z

I don't know when or if #10771 will merge into official, I release the code here for anyone who want to use it.

DieterDePaepe · 2019-01-04T16:55:38Z

For an algorithm library, I needed rolling window standard deviations. Using the sliding window as suggested here was the only approach I found that was fast, did not require additional libraries and did not have (albeit very small) rounding errors. (Even pandas rolling function had small errors up to 3e-7 in specific cases - see pandas-dev/pandas#27593 for details.)

Even if some functions (the examples listed before) can be done more optimized, it seems wrong to ignore the benefit it might have for other cases.

mhvk · 2019-01-04T17:35:48Z

@DieterDePaepe - there is a PR implementing this - #10771 - which seems to have stalled. it may be more useful to comment there (and check that it would do what you want it to!).

nils-werner · 2019-01-04T17:50:37Z

Nowadays I am just using scikit image's skimage.utils.view_as_windows() which solves this problem perfectly for any number of dimensions.

seberg · 2020-03-11T18:46:50Z

Just a note, since this is a popular issue and inactive for a while. I think we decided that we are happy to put it in, initially only in stride_tricks (which is semi-public). It just needs a strong advocate to hash out a good API really, and of course warnings that often it is not what you want, since things such as https://en.wikipedia.org/wiki/Summed-area_table are better.

nils-werner · 2020-03-19T10:06:29Z

Have you had a look at the skimage.utils.view_as_windows() and skimage.utils.view_as_blocks() functions?

Their API is quite clean, understandable and flexible, and their implementation is quite simple, too.

So if this functionality is really such great demand, they could be "upstreamed" into NumPy? Of course in coordination and with agreement from skimage-authors.

eric-wieser · 2020-10-07T10:39:01Z

Coming back to this, I realize I missed something in this comment

Regarding step or stride - I don't think it's necessary - sliding_window(x, shape, step) can be spelt sliding_window(x, shape)[::step], with a tiny O(ndim) overhead

Furthermore, the step argument is ambiguous - does it specify the step within a window, or the step between windows? Again, slicing can express these clearly:

within a window: sliding_window(x_1d, N)[:, ::step]

between windows: sliding_window(x_1d, N)[::step, :]

Furthermore, slicing lets an offset be specified, whereas a step argument would not

When slicing between windows, chances are sliding_window(x_1d, 1 + N*(step-1))[:, ::step] is what is actually wanted - which is complex enough to write to perhaps justify being built into the function. Additionally, for this use case there's no reason to want to specify an offset anyway.

Test cases are shown in the issue page.

@eric-wieser

* implement sliding_window_view #7753 Test cases are shown in the issue page. * Add Example Cases * Add step_size and N-dim support * Add shape and step_size check. Remove warning. * Remove shape default Add step_size default's description. * Give proper parameter name 'step' * fix a parameter description mistake * implement test function for sliding_window_view() * implement test function for sliding_window_view() * Fix according to @eric-wieser comments * Change arange to ogrid in Examples * remove np.squeeze on return line * Clarify document to avoid parameter confusion. * add `writable` and more explanation in docs * resolve a write conflit * fixes according to @seberg review * resolve write hazard * remove outdated docs. * change referring according to @mattip. change 'writeable' to 'readonly' as @seberg suggest. remove 'step' as @eric-wieser request * fix test minor error * DOC: Grammar fixes * STY: Add missing line break required by PEP8 * + Change readonly parameter to writeable. + Update writeable description. + Fix a few parameter checks. + Other minor improvements. * Move to new api as proposed by @eric-wieser - Change api to follow suggestion by Eric Wieser in #10771 (comment) - Update docstring - Add more tests * Improve documentation * Add sliding_window_view to documentation index * Apply suggestions from code review Co-authored-by: Eric Wieser <wieser.eric@gmail.com> * Fix window shape check * Add `sliding_window_view` to __all__ * Add tests for error cases * Add array_function dispatching * Change dispatcher argument defaults to None * Simplify array function dispatching * Added "np." prefix to doctests * Split tests * Improved docstring * Add release note * Fix docstring formatting * Fix doctest * Remove namespacing in documentation indexing * Improve docstring * Improved docstring * Simplified docstring * Improve docstring to make pseudo code stand out * Improve docstring * Add simple application example * Correct release note * Improve link with as_strides * Add note about performance * Tidy up main doc string * Make language on performance warning stronger * Bugfix: pass subok and writeable to as_strided * Add writeable test * Add subok test * Change subok test to use custom array subclass instead of unsupported MaskedArray * Add version added information Co-authored-by: Eric Wieser <wieser.eric@gmail.com> Co-authored-by: Fanjin <fjzeng@ucsd.edu> Co-authored-by: Fanjin Zeng <Fnjn@users.noreply.github.com> Co-authored-by: Eric Wieser <wieser.eric@gmail.com> Co-authored-by: fanjin <fjzeng@outlook.com> Closes gh-7753

eric-wieser mentioned this issue Mar 16, 2018

[Feature Request] im2col and col2im #10754

Closed

tsando mentioned this issue Mar 20, 2018

Rolling window with step size pandas-dev/pandas#15354

Closed

fanjin-z mentioned this issue Mar 21, 2018

ENH: implement sliding_window_view #10771

Closed

rth mentioned this issue Jul 29, 2018

[MRG] Optimize 3D display (continued) matplotlib/matplotlib#11577

Closed

10 tasks

mrocklin mentioned this issue Apr 1, 2019

Support sliding window computations dask/dask#4659

Closed

eric-wieser mentioned this issue Jul 8, 2019

ENH: Add movingaverage function. #13923

Closed

This was referenced Jul 25, 2019

API? how to do a "rolling groupby" or groupby with overlapping groups? pandas-dev/pandas#26959

Open

Numerical errors in rolling std (0.23.4) pandas-dev/pandas#27593

Closed

endolith mentioned this issue Aug 3, 2019

Document numpy.lib.stride_tricks.as_strided #6565

Closed

niowniow mentioned this issue Dec 10, 2019

Feature Request: Efficient rolling with strides pydata/xarray#3608

Open

zklaus mentioned this issue Sep 29, 2020

ENH: Implement sliding window #17394

Merged

zklaus pushed a commit to zklaus/numpy that referenced this issue Oct 29, 2020

implement sliding_window_view numpy#7753

3b95d56

Test cases are shown in the issue page.

seberg closed this as completed in #17394 Nov 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: Sliding Window Function #7753

Suggestion: Sliding Window Function #7753

nils-werner commented Jun 17, 2016 •

edited

jaimefrio commented Jun 17, 2016 •

edited by eric-wieser

mhvk commented Jun 17, 2016

nils-werner commented Jun 21, 2016 •

edited by eric-wieser

jaimefrio commented Jun 21, 2016 •

edited by eric-wieser

nils-werner commented Jun 21, 2016

toddrjen commented Aug 10, 2016 •

edited

ghost commented Sep 5, 2016

eric-wieser commented Mar 16, 2018 •

edited

eric-wieser commented Mar 16, 2018 •

edited

lijunzh commented Mar 17, 2018

nils-werner commented Mar 18, 2018

lijunzh commented Mar 18, 2018

fanjin-z commented Mar 19, 2018 •

edited by eric-wieser

shoyer commented Mar 21, 2018

eric-wieser commented Mar 21, 2018

eric-wieser commented Mar 21, 2018 •

edited

fanjin-z commented May 12, 2018

DieterDePaepe commented Jan 4, 2019 •

edited

mhvk commented Jan 4, 2019

nils-werner commented Jan 4, 2019 •

edited

seberg commented Mar 11, 2020

nils-werner commented Mar 19, 2020

eric-wieser commented Oct 7, 2020

Suggestion: Sliding Window Function #7753

Suggestion: Sliding Window Function #7753

Comments

nils-werner commented Jun 17, 2016 • edited

jaimefrio commented Jun 17, 2016 • edited by eric-wieser

mhvk commented Jun 17, 2016

nils-werner commented Jun 21, 2016 • edited by eric-wieser

jaimefrio commented Jun 21, 2016 • edited by eric-wieser

nils-werner commented Jun 21, 2016

toddrjen commented Aug 10, 2016 • edited

ghost commented Sep 5, 2016

eric-wieser commented Mar 16, 2018 • edited

eric-wieser commented Mar 16, 2018 • edited

lijunzh commented Mar 17, 2018

nils-werner commented Mar 18, 2018

lijunzh commented Mar 18, 2018

fanjin-z commented Mar 19, 2018 • edited by eric-wieser

Test case1

shoyer commented Mar 21, 2018

eric-wieser commented Mar 21, 2018

eric-wieser commented Mar 21, 2018 • edited

fanjin-z commented May 12, 2018

DieterDePaepe commented Jan 4, 2019 • edited

mhvk commented Jan 4, 2019

nils-werner commented Jan 4, 2019 • edited

seberg commented Mar 11, 2020

nils-werner commented Mar 19, 2020

eric-wieser commented Oct 7, 2020

nils-werner commented Jun 17, 2016 •

edited

jaimefrio commented Jun 17, 2016 •

edited by eric-wieser

nils-werner commented Jun 21, 2016 •

edited by eric-wieser

jaimefrio commented Jun 21, 2016 •

edited by eric-wieser

toddrjen commented Aug 10, 2016 •

edited

eric-wieser commented Mar 16, 2018 •

edited

eric-wieser commented Mar 16, 2018 •

edited

fanjin-z commented Mar 19, 2018 •

edited by eric-wieser

eric-wieser commented Mar 21, 2018 •

edited

DieterDePaepe commented Jan 4, 2019 •

edited

nils-werner commented Jan 4, 2019 •

edited