Skip to content

Conversation

@lionel-rowe
Copy link

Fix for the problem I highlighted here: https://www.phpbb.com/customise/db/extension/thanks_for_posts_2/support/topic/236261

One major problem in how ratings are calculated is that they tend to obey Benford's Law, giving exponentially-skewed distribution.

For example, in a typical forum, there might be just one post with 100 thanks, with a few hovering around the 80-90 mark, and the vast majority having just a few. In this hypothetical forum, almost all posts would have ratings close to zero, implying they're bad (or at least not particularly valuable). If less than 2% of posts gained more than 5 thanks, a post with 5 thanks would already be in the 98th percentile of outstanding posts, yet its rating would show only 5%, due to being ranked against that 100-thank post instead of the vast majority of its peers!

The most accurate way of fixing this problem would be to rate posts by their percentile; however, this would massively complicate the calculation logic and probably impact performance a lot, as every single new post would affect the rating of every single other post.

A much simpler solution would be simply applying an exponential easing function to the current ratings to adjust them. This would counteract the exponential effect from Benford's Law and give a much more even distribution.

@lionel-rowe lionel-rowe force-pushed the ease-ratings-exponentially branch from 19a8498 to c8d6a61 Compare March 28, 2022 13:08
@lionel-rowe lionel-rowe force-pushed the ease-ratings-exponentially branch 2 times, most recently from c9d716a to 244c5e8 Compare March 28, 2022 14:27
@lionel-rowe
Copy link
Author

Test suite is failing on the MSSQL 2017 step, presumably for a reason unrelated to the PR as my code changes purely affect formatting and don't touch anything database-related.

@rxu
Copy link
Owner

rxu commented Mar 29, 2022

Just to clarify (x is a value of $row['post_thanks'] / ($max_post_thanks); y is resulting post rating in % ) .

So post with 1/6 thanks count of mostly thanked one will get rating of ~66%, post with 1/3 count of mostly thanked one will get rating of ~90%.
I'm not sure if it is correct approach from the posts' evaluation point of view.

Current rating distribution (y(x)=100*x) New rating distribution (y(x)=(1−2^(-10x))*100)

yotx ru (1)

yotx ru

@lionel-rowe
Copy link
Author

@rxu

So post with 1/6 thanks count of mostly thanked one will get rating of ~66%, post with 1/3 count of mostly thanked one will get rating of ~90%.

Yes, that's correct. The reasoning is that top-rated posts already tend to follow an exponential distribution: for example, in a forum where "100%" (the top rated post) has 100 thanks, getting the number of thanks for a random sample of 10 posts will typically look something like "2, 0, 0, 1, 6, 11, 2, 5, 4, 0" rather than "94, 27, 38, 90, 73, 6, 18, 46, 62, 13".

As a result, the current ratings tend to be almost universally very low, making it look like the vast majority of posts are of "low quality".

Easing the results exponentially counteracts this, approximating roughly the distribution you'd expect if ratings were percent_iles_ rather than percent_ages_ of the top post (using the actual percentile would likely overcomplicate things, as doing so in a performant way would probably require lots of caching).

I'm certainly open to fine-tuning the calculation, or perhaps making it user-configurable somehow, or maybe even revisiting the percentile idea if I can think of a way to simplify it — thoughts?

@rxu
Copy link
Owner

rxu commented Mar 30, 2022

I'm certainly open to fine-tuning the calculation, or perhaps making it user-configurable somehow

Makes sense, I guess having some switch or an option to select a distribution strategy (probably even more than amongst just 2) would be great.

@lionel-rowe lionel-rowe marked this pull request as draft May 4, 2022 22:35
@rxu rxu force-pushed the develop-3.2.x branch 2 times, most recently from ec8b4f1 to 1807fd1 Compare October 8, 2023 04:19
@rxu rxu force-pushed the develop-3.2.x branch 3 times, most recently from 81843ac to 392f370 Compare April 12, 2024 15:19
@rxu rxu force-pushed the develop-3.2.x branch 3 times, most recently from 74b7acb to 2976ae8 Compare October 27, 2024 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants