-
Notifications
You must be signed in to change notification settings - Fork 9
Implement support for new epoch columns to StreamDBClient queries #216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I think this is not very good approach as this could introduce security issues for workstation users when running improperly configured MariaDB, like unauthenticated access but open to the internet. Also while the CI starts from scratch everytime using well known steps, the database developer is using might get out of sync with schema changes or have lingering data which makes the tests unreliable. Example of the actual issue with out of date database like I had: Concider implementing it like described in https://github.com/teragrep/mvn_01/tree/main/testcontainers |
StrongestNumber9
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests requires quite a lot of cleaning and adding the testcontainers as they are very mentally expensive to parse, especially with the duplicated codeblocks
| HostRecord hostRecord = new HostRecord(UShort.valueOf(1), "testHost1"); | ||
| ctx.insertInto(JOURNALDB.HOST).set(hostRecord).execute(); | ||
|
|
||
| // Set logdate to 2023-10-04 instead of the correct 2023-10-05 to emulate timezone issues, and test if epoch_hour takes priority or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like something worth of two distinct tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the test configuration to use predetermined UTC-4 timezone for all tests instead of whatever the system the test runs on decides to use as session timezone.
epochHourNullTest() now tests if the results are affected by the UTC-4 session timezone as expected when old logdate/logtime implementation is used.
epochHourTimezoneTest() tests if logtime and logdate results are unaffected by the UTC-4 timezone.
| long logtime = hourRange.get(0).get(8, Long.class); | ||
| Assertions.assertEquals(1696471200L, logtime); | ||
| Date logdate = hourRange.get(0).get(5, Date.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracting these hourRanges to single use variable looks a bit noisier than it has to be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed single use variables in commit 8f96a57
| long logtime = hourRange.get(0).get(8, Long.class); | ||
| Assertions.assertEquals(1696471200L, logtime); | ||
| Date logdate = hourRange.get(0).get(5, Date.class); | ||
| Assertions.assertEquals(Date.valueOf(date), logdate); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this Date.valueOf(date) be changed into literal of some sort instead of using output of a function with a variable passed as the value? This is to avoid some bizarre scenario where both Date.valueOf(date) and logdate ends up being null or 0 or something else similarly unexpected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed Date.valueod() to literal string format instead of LocalDate in commit 8f96a57
| "example.log-@1696471200-2023100423.log.gz", | ||
| new Timestamp(2025, 8, 13, 16, 18, 22, 0), | ||
| ULong.valueOf(120L), | ||
| "e2I8CnejWweTSk8tmo4tNkDO2fU7RajqI111FPlF7Mw=", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random sha256Checksum value, replaced with simple description string in commit 381c1d3
| new Timestamp(2025, 8, 13, 16, 18, 22, 0), | ||
| ULong.valueOf(120L), | ||
| "e2I8CnejWweTSk8tmo4tNkDO2fU7RajqI111FPlF7Mw=", | ||
| "5ddea0496b0a9ad1266b26da3de9f573", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random archiveEtag value, replaced with simple description string in commit 381c1d3
| LogfileRecord logfileRecord = new LogfileRecord( | ||
| ULong.valueOf(1), | ||
| Date.valueOf(LocalDate.of(2023, 10, 4)), | ||
| new Date(2125 - 1900, 7, 13), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like magic numbers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaced with easier to read Date.valueOf() in commit 381c1d3
| // Init mandatory Config object with the minimum options required for testing StreamDBClient. | ||
| Map<String, String> opts = new HashMap<>(); | ||
| opts.put("S3endPoint", "mock"); | ||
| opts.put("S3identity", "mock"); | ||
| opts.put("S3credential", "mock"); | ||
| opts.put("DBusername", streamDBUsername); | ||
| opts.put("DBpassword", streamDBPassword); | ||
| opts.put("DBurl", streamDBUrl); | ||
| opts.put("DBstreamdbname", streamdbName); | ||
| opts.put("DBjournaldbname", journaldbName); | ||
| opts.put("queryXML", "<index value=\"example\" operation=\"EQUALS\"/>"); | ||
| opts.put("archive.enabled", "true"); | ||
| Config config = new Config(opts); | ||
| // Add test data to journaldb and streamdb, there are several tables in both databases. | ||
| Settings settings = new Settings() | ||
| .withRenderMapping(new RenderMapping().withSchemata(new MappedSchema().withInput("streamdb").withOutput(config.archiveConfig.dbStreamDbName), new MappedSchema().withInput("journaldb").withOutput(config.archiveConfig.dbJournalDbName), new MappedSchema().withInput("bloomdb").withOutput(config.archiveConfig.bloomDbName))); | ||
| final Connection connection = Assertions | ||
| .assertDoesNotThrow( | ||
| () -> DriverManager | ||
| .getConnection( | ||
| config.archiveConfig.dbUrl, config.archiveConfig.dbUsername, | ||
| config.archiveConfig.dbPassword | ||
| ) | ||
| ); | ||
| final DSLContext ctx = DSL.using(connection, SQLDialect.MYSQL, settings); | ||
|
|
||
| BucketRecord bucketRecord = new BucketRecord(UShort.valueOf(1), "bucket1"); | ||
| ctx.insertInto(JOURNALDB.BUCKET).set(bucketRecord).execute(); | ||
|
|
||
| ctx | ||
| .query(String.format("INSERT INTO %s.category (id, name) VALUES (1, 'testCategory')", journaldbName)) | ||
| .execute(); | ||
| ctx | ||
| .query( | ||
| String | ||
| .format( | ||
| "INSERT INTO %s.source_system (id, name) VALUES (2, 'testSourceSystem2')", | ||
| journaldbName | ||
| ) | ||
| ) | ||
| .execute(); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These could be deduplicated either in beforeEach or as some buildDSLContext function. Repeating 40 idential lines per test is not very easy to mentally parse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Implemented testcontainers to the tests. As testcontainers do not support managing several databases inside the same testcontainer using the methods available on the testcontainer object, the streamdb database had to be initialized using a separate CREATE_STREAMDB_DB.sql script that is called during container startup. Note: New databases can't be created after the container is already started as the query results in error with access denied for the user. |
|
Cleaned up beforeEach further by moving all journaldb table creation and test data population inside the script that is called during container startup. |
|
The tests looks a lot better now, good job with them! I still couldn't run the project as I use podman and not docker but I'll look into what was missing |
|
I was able to make it work with podman, opened ticket a ticket for enhancing documentation in teragrep/mvn_01#84 I think this looks acceptable but I would like @kortemik to give another sanity check |
|
Noticed an error in logdate definition for StreamDBClient queries. The epoch based logdate that is not affected by session timezone should actually be sourced from epoch_hour column instead of epoch_archived column. |
13cfba8 to
c064d8b
Compare
|
Rebased to main. |
|
Rebased to main. |
|
Rebased to main, updated the new SteamDBClient tests in the rebase. |
elliVM
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments
elliVM
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
please rebase |
|
Rebased to main. |
|
proper indexes are missing for logfile table regardin epoch_hour, coalesce will call non-indexed column epoch_hour and cause full table scan. additionally coalesce will cause the search to be non-search argument able meaning b-tree scan can not be used and therefore this approach will not work. see more at https://dba.stackexchange.com/questions/162024/is-coalesce-sargable-now |
| logger.debug("NestedTopNQuery.getTableStatement exit"); | ||
| final Field<Date> logdateFunction = DSL | ||
| .field( | ||
| "CAST(date_add('1970-01-01', interval {0} second) as DATE)", Date.class, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mariadb has FROM_UNIXTIME, please use it
|
|
||
| final Field<Date> logdateFunction = DSL | ||
| .field( | ||
| "CAST(date_add('1970-01-01', interval {0} second) as DATE)", Date.class, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mariadb has FROM_UNIXTIME, please use it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using FROM_UNIXTIME forces the logdateFunction result to be tied to session timezone. date_add is used as replacement that is not affected by the session timezone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Date type is assumed to be session timezone, where do we use this and why? why do not we just pass the epoch as bigint?
…me field creation. REBASE Implemented coalesce with epoch_hour converted to date for defining logdate of the logfile. REBASE
…hen available. REBASE
…sing date arithmetics that completely ignores session timezone. Added epochHourTimezoneTest. REBASE
…TableNullEpochTest().
|
Rebased and removed the functionality for using old logtime implementation for now. |
|
I think we should consider changing the query contract from a calendar-base ( I suggest this because it is frequently non-obvious whether MariaDB applies session or system timezone rules when casting or deriving the @kortemik since the epoch_hour migration is in process, do you think it's acceptable to switch at this time? |
elliVM
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, question still open about switching to epoch range based query logic. Perhaps can be done in another issue
|
|
||
| final Field<Date> logdateFunction = DSL | ||
| .field( | ||
| "CAST(date_add('1970-01-01', interval {0} second) as DATE)", Date.class, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Date type is assumed to be session timezone, where do we use this and why? why do not we just pass the epoch as bigint?
Description
The aim of this pull request is to implement support for new epoch time columns to logfile metadata queries done by StreamDBClient. Original logdate and logtime implementations are prone to timezone issues where the result of the query changes depending on the session timezone, using epoch time resolves this issue.
Includes:
Resolves #213
Resolves #214
Resolves #284
Checklists
Testing
General
Assertions
Testing Data
Statements
Java
Other
Code Quality