Conversation
devmotion
left a comment
There was a problem hiding this comment.
I think silently incorrect results should be avoided as much as possible. I want to mention though that based on my experience in Distributions, this PR trades off safety versus performance and convenience: Due to numerical inaccuracies the check can fail even if the user computes the input vector in such a way that in non-floating point math it would be normalized.
A simple benchmark:
master:
julia> using StatsBase, Zygote, BenchmarkTools
julia> @btime entropy($(fill(1e-5, 10^5)));
472.918 μs (0 allocations: 0 bytes)
julia> _, pb = Zygote.pullback(entropy, fill(1e-5, 10^5));
julia> @btime $pb(1.0);
74.084 μs (16 allocations: 781.58 KiB)This PR:
julia> using StatsBase, Zygote, BenchmarkTools
julia> @btime entropy($(fill(1e-5, 10^5)));
512.763 μs (0 allocations: 0 bytes)
julia> _, pb = Zygote.pullback(entropy, fill(1e-5, 10^5));
julia> @btime $pb(1.0);
79.138 μs (21 allocations: 781.73 KiB)A more general comment: Can you add tests?
| return -sum(xlogx, p) | ||
| end | ||
|
|
||
| entropy(p, b::Real; check::Bool = true) = entropy(p; check) / log(b) |
There was a problem hiding this comment.
Should we check b as well? In any case, we need
| entropy(p, b::Real; check::Bool = true) = entropy(p; check) / log(b) | |
| entropy(p, b::Real; check::Bool = true) = entropy(p; check = check) / log(b) |
|
|
||
| Checks whether `p` is a probability vector, i.e. p[i] >= 0 for each index i, and sum(p) ≈ 1. | ||
| Taken from `Distributions.isprobvec`.""" | ||
| isprobvec(p::AbstractVector{<:Real}) = all(x -> x ≥ zero(x), p) && isapprox(sum(p), one(eltype(p))) |
There was a problem hiding this comment.
It would be nice to support tolerances here due to floating point inaccuracies but I don't see a nice way to forward them to this function.
fixes #769 docstrings added, tests missing