Skip to content

<condition_variable>: wait_for with a Predicate used with floating point duration is extremely unsafe #6088

@CookiePLMonster

Description

@CookiePLMonster

Describe the bug

First of all: I don't know with absolute certainty if this is considered a bug, or surprising-yet-Standard behavior. MSVC STL and Clang STL are both affected by the same issue, so I am hoping to use this issue report as means of starting a conversation about the validity of wait_for's implementation.

This particular issue was observed with condition variables, but it affects everything using _To_absolute_time(). This makes std::this_thread::sleep_for just as likely to be affected.

This correctly-looking way of using a std::condition_variable:

condVar.wait_for(lock, std::chrono::duration<float>(TimeInSecs), predicate);

compiles with no warnings or errors on /W4, but can lead to non-obvious issues with the condition variable timing out prematurely due to floating point precision issues.

Following the chain of events:

  1. wait_for gets called.
  2. wait_for with a Predicate is implemented with wait_until:
    return wait_until(_Lck, _To_absolute_time(_Rel_time), _STD move(_Pred));
  3. _To_absolute_time tries to use a common type to store the result of std::steady_clock::now() + _Rel_time:
    decltype(_Now + _Rel_time) _Abs_time = _Now; // return common type

This, however, leads to highly negative side effects if _Rel_time is a floating-point duration. The result type of adding a long long (which now() is) and float is a float, but that type simply cannot precisely store a high enough value of now(), which is a system uptime in micro- or nano-seconds.

With high enough uptimes, _Abs_time <= _Now, and so the condition variable ends up timing out immediately if the predicate fails. This effectively turns this usage of CV into a busy loop, and that is exactly what we've observed in production.

Using std::chrono::duration<double> avoids this problem at least in the short term, as double has enough precision to store high values of std::steady_clock::now(). However, it's only a workaround, and a true fix involves not using any floating-point types for delays, regardless of width.

Command-line test case

N/A

Expected behavior

I expect .wait_for(lock, std::chrono::duration<float>(x), predicate) and sleep_for(std::chrono::duration<float>(x)) to do one of the two things:

  1. Emit a descriptive warning for the developer.
  2. Correctly delay execution without premature timeouts due to floating point precision issues.

STL version

The issue was initially observed on a 14.34 compiler with this STL version:
#define _MSVC_STL_UPDATE 202208L
but looking at the repository, the implementation is still identical in the latest main branch.

Additional context

N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    blocked on LWGWaiting for WG21 to tell us what to dochronoC++20 chrono

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions