Conversation
Adds the method for a weighted mean of elements transformed by a function. - Added `mean(f, itr, weights)` - Added tests for the method
Added tests for UnitWeights
src/weights.jl
Outdated
| mean(f, A::AbstractArray, w::AbstractWeights; dims::Union{Colon,Int}=:) = | ||
| _mean(f.(A), w, dims) | ||
|
|
||
| function mean(f, A::AbstractArray, w::UnitWeights; dims::Union{Colon,Int}=:) |
There was a problem hiding this comment.
This is a slightly more generic version of the same thing, which is less likely to require maintenance in the future (if we add additional keyword arguments to mean). I suggest using this pattern when you can.
| function mean(f, A::AbstractArray, w::UnitWeights; dims::Union{Colon,Int}=:) | |
| function mean(f, A::AbstractArray, w::UnitWeights; kwargs...) |
src/weights.jl
Outdated
| function mean(f, A::AbstractArray, w::UnitWeights; dims::Union{Colon,Int}=:) | ||
| a = (dims === :) ? length(A) : size(A, dims) | ||
| a != length(w) && throw(DimensionMismatch("Inconsistent array dimension.")) | ||
| return mean(f.(A), dims=dims) |
There was a problem hiding this comment.
Avoid memory allocation by using mean(f, A) instead of mean(f.(A)). Remember that f.(A) creates an extra array, which is slow. Memory access is usually the biggest bottleneck on modern CPUs. mean(f.(A)) is 2 separate operations: The first one creates a new array, f.(A), and the second calculates its mean. mean(f, A) calculates the mean of (f(x) for x in A) directly, as one operation, without creating a new array.
| return mean(f.(A), dims=dims) | |
| return mean(f, A; dims) |
I'd also suggest making it slightly more generic, as
| return mean(f.(A), dims=dims) | |
| return mean(f, A; kwargs...) |
src/weights.jl
Outdated
| end | ||
|
|
||
| """ | ||
| mean(f, A::AbstractArray, w::AbstractWeights[, dims::Int]) |
There was a problem hiding this comment.
| mean(f, A::AbstractArray, w::AbstractWeights[, dims::Int]) | |
| mean(f, A::AbstractArray, w::AbstractWeights[; dims]) |
dims shouldn't be required to be an integer.
src/weights.jl
Outdated
| ``` | ||
| """ | ||
| mean(f, A::AbstractArray, w::AbstractWeights; dims::Union{Colon,Int}=:) = | ||
| _mean(f.(A), w, dims) |
There was a problem hiding this comment.
Avoid memory allocation by using mean(f, A) instead of mean(f.(A)). Remember that f.(A) creates an extra array, which is slow. Memory access is usually the biggest bottleneck on modern CPUs. mean(f.(A)) is 2 separate operations: The first one creates a new array, f.(A), and the second calculates its mean. mean(f, A) calculates the mean of (f(x) for x in A) directly, as one operation, without creating a new array.
| _mean(f.(A), w, dims) | |
| _mean(f, A; dims) |
I'd also suggest making it slightly more generic, as
| _mean(f.(A), w, dims) | |
| _mean(f, A; kwargs...) |
(See below for more details.)
- Add keyword arguments for the weights - Modified functions to use `Iterators.map` - Add more tests
src/weights.jl
Outdated
| ``` | ||
| """ | ||
| mean(f, A, w::AbstractWeights; kwargs...) = | ||
| mean(broadcast(f, A), w; kwargs...) |
There was a problem hiding this comment.
Like I said, check the method for weighted sem to see how to use Broadcasting.broadcasted correctly. broadcast is not the same thing as Broadcasting.broadcasted. The latter is lazy.
All checks not passed
- Removed implementation for multi-dimensional array - Updated documentations - Updated tests
src/weights.jl
Outdated
| return sum(Broadcast.broadcasted(f, A, w) do f, a_i, wg | ||
| return f(a_i) * wg | ||
| end) / sum(w) |
There was a problem hiding this comment.
You should use Broadcast.instantiate. Moreover, to me it seems this should be changed to
| return sum(Broadcast.broadcasted(f, A, w) do f, a_i, wg | |
| return f(a_i) * wg | |
| end) / sum(w) | |
| return sum(Broadcast.instantiate(Broadcast.broadcasted(A, w) do a_i, wg | |
| return f(a_i) * wg | |
| end)) / sum(w) |
There was a problem hiding this comment.
Using Broadcast.instantiate causes extra allocations, no?
There was a problem hiding this comment.
I haven't tried. But without Broadcast.instantiate it will be slow and fall back to Cartesian indexing. See, e.g., JuliaLang/julia#31020 ("we require Broadcast.instantiate for fast reduce").
Used `Broadcast.instantiate` as requested to overcome falling back to Cartesian indexing
src/weights.jl
Outdated
| ``` | ||
| """ | ||
| mean(f, A::AbstractArray, w::AbstractWeights) = | ||
| _funcweightedmean(f, A, w) |
There was a problem hiding this comment.
I don't remember, was there any particular reason for introducing _funcweightedmean instead of implementing two methods for mean?
There was a problem hiding this comment.
No there wasn't any such reason. I am removing this function, and implementing it as a method for mean
Instead deploy it as a method for `mean`
|
@devmotion @ParadaCarleton gentle ping, is this PR ready to merge? |
This PR addresses #868 . Adds the method for a weighted mean of elements transformed by a function.
mean(f, itr, weights)