Skip to content

Comments

<regex>: Implement small buffer optimization to store loop state#6092

Open
muellerj2 wants to merge 1 commit intomicrosoft:mainfrom
muellerj2:regex-small-buffer-optimization-for-loop-state
Open

<regex>: Implement small buffer optimization to store loop state#6092
muellerj2 wants to merge 1 commit intomicrosoft:mainfrom
muellerj2:regex-small-buffer-optimization-for-loop-state

Conversation

@muellerj2
Copy link
Contributor

Disclaimer: Low-level memory management is not my forte, because I mostly write Java code in my day-to-day job.

Towards #5969.

This implements a minimal fixed-size buffer to support some small buffer optimizations during regex matching. In the first step, this fixed-size buffer is only used for the loop state storage (because this requires minimal changes in the matcher), but I plan to use it to store the state of capturing groups as well in follow-up PRs. For frames, we rather need a vector-like data structure that can grow but uses stack storage when the number of frames is few.

The fixed-size buffer takes an allocator in preparation for #174.

Since I want to use some common storage space on the stack for all of these buffers, the stack storage is not an internal member of the buffers but supplied as a pointer to them from the outside. The fixed-size buffer reports back to what extent it has made use of the stack storage. (We don't need this in this PR yet, because the stack storage is only by a single fixed-size buffer. But this will change when more buffers make use of the stack storage.)

The stack storage is also not a member of _Matcher3 so that we can adjust its size in the future without having to rename the class.

I wasn't sure what exception to throw if too much memory is requested from the fixed-size buffer, but then went with regex_error(error_stack) because I would like to avoid exceptions not of type regex_error and the description of error_stack is "There was insufficient memory to determine whether the regular expression could match the specified character sequence." (The other candidate is error_space. But it is described as "There was insufficient memory to convert the expression into a finite state machine.", so it appears to be intended for parsing and not matching.)

I do wonder whether we should call std::launder() when repurposing the stack storage (here, specifically, in _Rx_fixed_size_buffer::_Use_external_buf()). But it seems this isn't done elsewhere in the STL, including _Optimistic_temporary_buffer.

Benchmarks

I highlighted the relevant lines in regex_search where the matcher uses some loop state. (The regex_match benchmark is inconclusive, the difference vanishes in the noise on my machine.)

benchmark before [ns] after [ns] speedup
bm_lorem_search/"^bibe"/2 57.20 59.99 0.95
bm_lorem_search/"^bibe"/3 58.59 59.38 0.99
bm_lorem_search/"^bibe"/4 61.38 58.59 1.05
bm_lorem_search/"bibe"/2 3048.28 2849.48 1.07
bm_lorem_search/"bibe"/3 6093.75 5719.87 1.07
bm_lorem_search/"bibe"/4 11439.70 11474.60 1.00
bm_lorem_search/"bibe".collate/2 3076.18 3138.95 0.98
bm_lorem_search/"bibe".collate/3 5937.50 5781.25 1.03
bm_lorem_search/"bibe".collate/4 11718.80 11160.70 1.05
bm_lorem_search/"(bibe)"/2 6975.45 7324.22 0.95
bm_lorem_search/"(bibe)"/3 14299.70 14229.90 1.00
bm_lorem_search/"(bibe)"/4 28250.40 28250.40 1.00
bm_lorem_search/"(bibe)+"/2 12207.00 11090.90 1.10
bm_lorem_search/"(bibe)+"/3 23541.90 22216.50 1.06
bm_lorem_search/"(bibe)+"/4 43946.30 43526.20 1.01
bm_lorem_search/"(?:bibe)+"/2 6975.45 5937.50 1.17
bm_lorem_search/"(?:bibe)+"/3 13497.40 11230.50 1.20
bm_lorem_search/"(?:bibe)+"/4 26681.00 22949.20 1.16
bm_lorem_search/R"(\bbibe)"/2 83705.40 81961.50 1.02
bm_lorem_search/R"(\bbibe)"/3 168795.00 167411.00 1.01
bm_lorem_search/R"(\bbibe)"/4 336967.00 314991.00 1.07
bm_lorem_search/R"(\Bibe)"/2 184168.00 187976.00 0.98
bm_lorem_search/R"(\Bibe)"/3 360695.00 385010.00 0.94
bm_lorem_search/R"(\Bibe)"/4 749860.00 739397.00 1.01
bm_lorem_search/R"((?=....)bibe)"/2 4865.38 4865.38 1.00
bm_lorem_search/R"((?=....)bibe)"/3 10044.60 9626.07 1.04
bm_lorem_search/R"((?=....)bibe)"/4 19252.40 19043.00 1.01
bm_lorem_search/R"((?=bibe)....)"/2 4854.90 4603.80 1.05
bm_lorem_search/R"((?=bibe)....)"/3 9207.55 9207.55 1.00
bm_lorem_search/R"((?=bibe)....)"/4 17996.80 17578.30 1.02
bm_lorem_search/R"((?!lorem)bibe)"/2 4589.84 4589.84 1.00
bm_lorem_search/R"((?!lorem)bibe)"/3 8719.31 8893.69 0.98
bm_lorem_search/R"((?!lorem)bibe)"/4 17089.80 17089.80 1.00

@muellerj2 muellerj2 requested a review from a team as a code owner February 18, 2026 17:26
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews Feb 18, 2026
_Allocation_guard& operator=(const _Allocation_guard&) = delete;
_Allocation_guard& operator=(_Allocation_guard&&) = delete;

~_Allocation_guard() noexcept {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't noexcept by default here in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I just copied this from some guard in <vector>.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then that's right probably.

_Tgt_state_t<_It> _Tgt_state;
_Tgt_state_t<_It> _Res;
vector<_Loop_vals_v3_t<_Iter_diff_t<_It>>> _Loop_vals;
_Rx_fixed_size_buffer<_Loop_vals_v3_t<_Iter_diff_t<_It>>> _Loop_vals;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we're not between releases and matcher doesn't need to be renamed again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That`s what I took from STL's remarks on Discord that he would like to audit no_unique_address for the 14.51 release: The branch-off is imminent, but hasn't happened yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. As long as we land this soon-ish, we're still covered by the _Matcher3 rename.

_Guard._Ptr = nullptr;
}

_Compressed_pair<_Alloc, _Val> _Mypair;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be the time to use no unique address

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no_unique_address is C++20. I think [[msvc::no_unique_address]] is supported in older C++ dialects as well as an extension, but then we need a decision that we will make use of this extension.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it's supported downlevel by MSVC, Clang, and EDG. (_MSVC_NO_UNIQUE_ADDRESS is currently not set up to activate downlevel.) There's also a concern with CUDA.

I think messing with it here is too much risk for too little value. I'd rather stick with _Compressed_pair.

return _Size * sizeof(_Ty);
} else {
_Allocate_buf(_Size);
return 0U;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suffix is misleading. Just 0.

(There's a pointer-size-dependent suffix I think, but we can't use that here due to C++14 at least)


_Matcher3<_BidIt, _Elem, _RxTraits, _It, void> _Mx(
_First, _Last, _Re._Get_traits(), _Re._Get(), _Re.mark_count() + 1, _Re.flags(), _Flgs);
alignas(_Loop_vals_v3_t<_Iter_diff_t<_It>>) unsigned char _Stackbuf[4096];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this alignment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually it's the same as a pointer, except if some user passes an iterator where _Iter_diff_t<_It> requires some stricter alignment (although I don't see why anyone would want to do that).

@StephanTLavavej StephanTLavavej self-assigned this Feb 19, 2026
@StephanTLavavej StephanTLavavej added performance Must go faster regex meow is a substring of homeowner labels Feb 19, 2026
@StephanTLavavej
Copy link
Member

I do wonder whether we should call std::launder()

As far as I'm concerned, launder() is silly and should never be used. If it would ever make a difference, that's a compiler bug. We've been implementing vector et al. creating elements via True Placement New for decades, and compilers simply aren't allowed to trash our code just because they feel like it, regardless of what the Standard might think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Must go faster regex meow is a substring of homeowner

Projects

Status: Initial Review

Development

Successfully merging this pull request may close these issues.

3 participants