Add new extension yagp hooks collector#1629
Draft
leborchuk wants to merge 119 commits intoapache:mainfrom
Draft
Conversation
Just create the overal project structure with a Makefile to generate protobufs, compile it into a shared library extension and install it
as a fundation to build on
- borrow GlowByte code to generate plan text and SessionInfo - borrow code from our in-house pg_stat_statements to generate query id and plan id - refactor code to follow common name conventions and identations
- do some minor refactoring to follow common naming convention - add additional message right after ExecutorStart hook
1) Query instrumentation 2) /proc/self/* stats
It allows finer granularity than executor hooks. Also removed some code duplication and data duplicaton
1. Initialize query instrumentation to NULL so that it can be properly checked later (temporary solution, need to find a proper fix) 2. Don't collect spillinfo on query end. Reason: a) it will always be zero and b) it could crash if we failed to enlarge a spillfile. Seems like we need some cummulative statistics for spillinfo. Need to check what explain analyze use.
1. Sync with protobuf changes to collect segment info 2. Remove noisy logging 3. Fix some missing node types in pg_stat_statements
Reason: when query info hook is called with status 'DONE' planstate is already deallocated by ExecutorEnd
1) Give higher gRPC timeouts to query dispatcher as losing messages there is more critical 2) If we've failed to send a message via gRPC we notify a background thread about it and refuse sending any new message until this thread re-establishes the lost connection
Don't collect system queries with empty query text and ccnt == 0
Rethrowing them might break other extensions and even query execution pipeline itself
* ereport(log) bug queries at the end of extension
* add pg alike tests
* send analyze in text & enable it
* report utility stmt
Trim strings larger than 1MB by default and if we cut multi-byte utf8 then discard the character and shift the cut position.
Copy of [1] from gpdb to collect workfile stats in yagp-hooks-collector. [1] open-gpdb/gpdb@8813a55
Copy of [1] from gpdb to create a global QueryState for unique hashing for yagp-hooks-collector. [1] open-gpdb/gpdb@476b540
Update usage in yagp_hooks_collector of - heap_create_with_catalog() - standard_ExecutorRun() - standard_ProcessUtility() - InstrAlloc() - CreateTemplateTupleDesc() - ExplainInitState() -> NewExplainState() - gpmon_gettmid() -> gp_gettmid() - Gp_session_role -> Gp_role - strerror(errno) -> "%m" - Include utils/varlena.h for SplitIdentifierString() in gpdbwrappers.cpp.
Remove unnecessary copies of the core jumbling functions from yagp_hooks_collector/stat_statements_parser. In commit [1] query jumbling moved to core, thus there is no need to keep a copy of jumbling functions in yagp_hooks_collector. [1] 5fd9dfa
In yagp_hooks_collector we need control over place where function is executed, and Cloudberry supports only set-returning functions to execute on COORDINATOR so change the type of the functions. We can see the error below without this change: ERROR: EXECUTE ON COORDINATOR is only supported for set-returning functions.
In gpdb create table was executed for each partition. Now one single create table is executed. Thus only one create table query goes through executor. Change the test accordingly.
Change makefile, test and add it to CI of yagp_hooks_collector. Add option --with-yagp-hooks-collector. Similarly to [1]. [1] open-gpdb/gpdb@7be8893
Correct CI for yagp_hooks_collector to use correct env script.
Correct defines for token ids copied from gram.y.
Full copy of [1] for yagp_hooks_collector. [1] open-gpdb/gpdb@845278f#diff-fa2654417413bbb37d47ecf1644dc5af90c76c77f2a90e05c27107967b5f6fd8
Similarly to [1] add missing executor query info hooks. [1] open-gpdb/gpdb@87fc05d
Copy of [1] with additinal changed needed for Clouberry are described below: The testing C functions have changed to set-returning ones if comparing with [1] because we need a control over the place where function is executed - either on master or segments, and in Cloudberry these functions must return set of values so they were changed to return SETOF. [1] open-gpdb/gpdb@989ca06
Copy of [1] - send() may return -1 in case of an error, do not add -1 to total_bytes sent. [1] open-gpdb/gpdb@e1f6c08
The extension generates normalized query text and plan using jumbling functions. Those functions may fail when translating to wide character if the current locale cannot handle the character set. Fix changes functions that generate normalized query text/plan to noexcept versions so we can check if error occured and continute execution. The test checks that even when those functions fail, the plan is still executed. This test is partially taken from src/test/regress/gp_locale.sql.
Cloudberry builds treat compiler warnings as errors. For consistency, this behavior has been enabled in yagp_hooks_collector. This commit also fixes the warnings in yagp_hooks_collector.
We faced an issue - segments fail with backtrace ``` #7 0x00007f9b2adbf2e0 in set_qi_error_message (req=0x55f24a6011f0) at src/ProtoUtils.cpp:124 #8 0x00007f9b2adc30d9 in EventSender::collect_query_done (this=0x55f24a5489f0, query_desc=0x55f24a71ca68, status=METRICS_QUERY_ERROR) at src/EventSender.cpp:222 #9 0x00007f9b2adc23e1 in EventSender::query_metrics_collect (this=0x55f24a5489f0, status=METRICS_QUERY_ERROR, arg=0x55f24a71ca68) at src/EventSender.cpp:53 ``` the root cause here is we're trying to send info about error message in a hooks collector. For some queries ErrorData struckture could be NULL despite the fact that an error has occurred. it depends on error type and location of the error. So we should check if we had info about error details before using it.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A diagnostic and monitoring extension for Cloudberry clusters
Components
Detailed description
This is quite similar to the #1085 idea. We need to create an extensible query workload tracking and monitoring system.
The overall idea is to send data out from cloudberry intsance to the external agent.