Skip to content

Conversation

@bosvos
Copy link
Contributor

@bosvos bosvos commented Aug 10, 2025

Add new feature to authaus to support "auth check" logging for usage analysis. This pull request is only for the 'authaus' repository (being used by 'imqsauth') and should support the in-memory structure and database persistence.

Auth usage tracking requirements
In our auth system, we can verify user session activity by checking the
/check endpoint with a valid session token. This endpoint returns a
JSON object containing the user's email, username, and a list of groups
the user belongs to. This information can be used to track user activity and system usage in general.

Implement a mechanism that will record these 'check' calls for analysis at a later time.

Design
The auth system should implement a mechanism to track user activity by logging
/check endpoint calls in memory, and periodically flushing this data to persistent storage.

It is important that the user identity (e.g. username) as well as the session token
is logged, so that we can correlate the user activity with the session from other
sources (e.g. router).

The in-memory data structure should be a time log of /check requests, with
timestamp, session token and identity.

The data should be flushed to storage at regular intervals (e.g. every 1 minute).
Expired sessions are not to be recorded, since they should fail the initial check in any case.

Implementation
The implementation of the auth usage tracking can be done as follows:

Create an in-memory data structure to hold the user check request logs.
Implement a middleware that intercepts the /check endpoint calls and logs
the user identity and session token along with the timestamp.
Ensure that the logs are stored in a format that allows for easy querying
Set up a periodic task that flushes the in-memory logs to the database
and analysis of user activity.
4.1 Assume a persistence abstraction from authaus and code against that.
4.2 Clear the in-memory logs after flushing to the database.
Add a configuration setting to enable or disable the auth usage tracking feature.
Implement error handling to ensure that any issues with logging or database
operations do not disrupt the user experience.

Refs: ASG-4837

@bosvos bosvos requested a review from Copilot August 10, 2025 09:49
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request implements auth usage tracking functionality to monitor user session check activity for analysis purposes. The feature adds an in-memory logging system that captures session check requests and periodically flushes them to database storage.

  • Adds a new CheckUsageTracker component to log session check requests with timestamps, session tokens, and user IDs
  • Implements configurable flush intervals and persistence to a new session_check_logs database table
  • Integrates the tracker into the existing Central authentication system with optional enable/disable functionality

Reviewed Changes

Copilot reviewed 11 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
permitchecklog.go Core implementation of CheckUsageTracker with in-memory logging and periodic flushing
permitchecklog_test.go Comprehensive test suite covering tracker functionality and performance
config.go Configuration structure for usage tracking settings
auth.go Integration of usage tracker into Central auth system
migrations.go Database migration to create session_check_logs table
testutils.go Updates test utility to include usage tracker
msaad_test.go Updates MSAAD test to accommodate new Central constructor
all_test.go Updates test setup and cleanup for new table
query/*.sql Analytics queries for usage tracking data analysis
go.mod Adds text formatting dependency for tests

} else {
// Shouldn't happen, but clear all if somehow we have fewer logs
t.logs = t.logs[:0]
}
Copy link

Copilot AI Aug 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition len(t.logs) >= logsCount assumes logs are only appended during persistence. If logs are added faster than they can be persisted, this could incorrectly remove newer logs. Consider using a more precise approach like tracking which specific logs were persisted.

Suggested change
}
// Remove only the logs that were actually persisted (matching prefix)
removeCount := 0
for removeCount < len(t.logs) && removeCount < len(logsToPersist) {
if t.logs[removeCount] != logsToPersist[removeCount] {
break
}
removeCount++
}
t.logs = t.logs[removeCount:]

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

@bosvos bosvos Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot I don't understand what you mean by "assumes logs are only appended during persistence"...which is the only portion we care about when dealing with the clearing of the logs...after clearing "up to" the persistence count, the persistence is done and t.logs carries on being populated as normal - the goroutine would have exited.

}

// LogCheck adds a check request to the in-memory log
// TODO: Place upper limit on buffer
Copy link

Copilot AI Aug 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LogCheck method lacks a buffer size limit, which could lead to unbounded memory growth under high load. Consider implementing a maximum buffer size with appropriate handling (e.g., dropping oldest entries or triggering immediate flush).

Suggested change
// TODO: Place upper limit on buffer
// LogCheck adds a check request to the in-memory log with buffer size limit

Copilot uses AI. Check for mistakes.
@bosvos bosvos force-pushed the jv-auth-check-log branch 2 times, most recently from 1b0ff49 to ed8ca4e Compare August 13, 2025 10:03
Copy link
Contributor

@LeonLeibbrandt LeonLeibbrandt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

flushTicker *time.Ticker
flushing bool // Track if a flush is in progress
db *sql.DB
droppedLogsCount int64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add droppedLogsCount to the database records, since there is no other exposure to it; it is only used in testing?

New feature for authaus to support "auth check" logging for usage analysis.

Features
- Log auth check endpoint calls
- Flush periodically to db
- User identity, token and timestamp is stored
- Limit memory consumption on persistance bottlenecks

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@bosvos bosvos force-pushed the jv-auth-check-log branch from ed8ca4e to 16668b3 Compare October 7, 2025 23:53
@FritzOnFire FritzOnFire changed the title Jv auth check log feat(log): Auth check log Oct 15, 2025
@FritzOnFire FritzOnFire changed the title feat(log): Auth check log feat: Auth check log Oct 15, 2025
Copy link
Contributor

@FritzOnFire FritzOnFire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bosvos I don't know how copilot is going to interpret the comments I made to you, and if it is going to just start implementing my comments, even thou some of them requires me and you to decide on what the correct implementation should be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad bot. SQL statements should be inside of the go files. Also, this is not valid SQL. Please remove all sql files from this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad bot. SQL statements should be inside of the go files. Also, this SQL does not allow us to specify a date range, order by and limit soon enough to stop the DB from processing all records in the table. Please remove all sql files from this PR

Comment on lines +7 to +9
/test.log
/test-2025-08-07T06-17-16.613.log
/test-2025-08-07T06-18-32.789.log
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should never have any *.log files in this repo. Please rather change this to

Suggested change
/test.log
/test-2025-08-07T06-17-16.613.log
/test-2025-08-07T06-18-32.789.log
*.log

/test.log
/test-2025-08-07T06-17-16.613.log
/test-2025-08-07T06-18-32.789.log
/TestAuthRoleDB_MissingGroups_in_github_com_IMQS_authaus.test.exe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should never have any *.exe files in this repo. Please rather change this to

Suggested change
/TestAuthRoleDB_MissingGroups_in_github_com_IMQS_authaus.test.exe
*.exe

type ConfigUsageTracking struct {
Enabled bool `json:"enabled"` // Enable/disable usage tracking
FlushIntervalSeconds int `json:"flush_interval_seconds"` // Flush interval in seconds (default: 60)
MaxEntries int `json:"max_entries"` // Maximum number of entries to keep in memory before flushing (default: 1000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MaxEntries int `json:"max_entries"` // Maximum number of entries to keep in memory before flushing (default: 1000)
MaxEntries int `json:"max_entries"` // Maximum number of entries to keep in memory before flushing (default: 10 000)

Comment on lines +102 to +111
for {
select {
case <-t.flushTicker.C:
t.flush()
case <-t.stopChan:
t.flushTicker.Stop()
t.flush() // Final flush before stopping
return
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad bot.

  • There needs to be an array of *CheckLogEntry in this go routine that is pre-allocated with a capacity of t.config.MaxEntries.
  • In the select we need to be a wait on the log channel that returns a *CheckLogEntry that needs to be appended to the array of *CheckLogEntry, then the length of the array needs to be compared with t.config.MaxEntries. If equal or more we call t.persistLogs with the array and we reslice the array to zero length.
  • The case <-t.flushTicker.C: should be calling t.persistLogs and reslice the array to zero length. Please remove the call to the flush function.
  • The case <-t.stopChan: should replace the call to flush with t.persistLogs
Suggested change
for {
select {
case <-t.flushTicker.C:
t.flush()
case <-t.stopChan:
t.flushTicker.Stop()
t.flush() // Final flush before stopping
return
}
}
logs := make(*CheckLogEntry, 0, t.config.MaxEntries)
for {
select {
case entry := <-t.logChan:
logs = append(logs, entry)
if len(logs) >= t.config.MaxEntries {
err := t.persistLogs(logs)
if err != nil {
t.log.Errorf("Failed to persist auth check logs: %v", err)
return
}
logs = logs[:0]
}
case <-t.flushTicker.C:
err := t.persistLogs(logs)
if err != nil {
t.log.Errorf("Failed to persist auth check logs: %v", err)
return
}
logs = logs[:0]
case <-t.stopChan:
t.flushTicker.Stop()
err := t.persistLogs(logs) // Final flush before stopping
if err != nil {
t.log.Errorf("Failed to persist auth check logs: %v", err)
return
}
return
}
}

@bosvos so when we check the error of persistLogs, we MUST bring the service down, as the service failed to do something that it was explicitly configured to do... but I don't really know how to tell Copilot this. We could replace the returns that I placed in there with panics, but I am used to having an errChan or a error variable that the main thread is checking, logs and kills the service if it ever gets a value.

For this library those would have to be global variables, or passed to every "child" of Central, but I don't think this library has any of those. I am kinda leaning to wards just telling Copilot to just call panic... it feels the most straight forward.

t.stopOnce.Do(func() {
if t.flushTicker != nil && t.stopChan != nil {
close(t.stopChan)
<-t.doneChan // Wait for main flush management goroutine to finish
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad bot. Make use of a sync.WaitGroup instead of reinventing the wheel with a channel.


// flush writes the in-memory logs to persistent storage and clears the memory
// TODO : Implement a more generic persistence abstraction (e.g. Provider pattern)
func (t *CheckUsageTracker) flush() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad bot. This entire function is the result of using an array instead of a channel to manage the *CheckLogEntrys. Please remove all of this unnecessary complexity (delete the function).

simple(`CREATE TABLE authuserstats (user_id BIGINT PRIMARY KEY, enabled_date TIMESTAMP, disabled_date TIMESTAMP, last_login_date TIMESTAMP);`),

// 17. Add: session_check_logs
simple(`CREATE TABLE session_check_logs (id bigserial PRIMARY KEY, ts TIMESTAMP WITH TIME ZONE NOT NULL, session_token VARCHAR NOT NULL, user_id BIGINT NOT NULL);`),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bosvos your prompt to Copilot did not mention how these logs are going to be read back. Which make this change to this library feel a bit half baked.

From my point of view. I can imagine three functions/use cases

func (t *CheckUsageTracker) FetchLogs(start time.Time, end time.Time, limit int64, offset int64) ([]*CheckLogEntry, error) { ... }

func (t *CheckUsageTracker) FetchLogsSession(start time.Time, end time.Time, limit int64, offset int64, session string) ([]*CheckLogEntry, error) { ... }

func (t *CheckUsageTracker) FetchLogsUser(start time.Time, end time.Time, limit int64, offset int64, userID int64) ([]*CheckLogEntry, error) { ... }

Based on those three functions we would need the following indexes added to this nr 17 db migration.

Suggested change
simple(`CREATE TABLE session_check_logs (id bigserial PRIMARY KEY, ts TIMESTAMP WITH TIME ZONE NOT NULL, session_token VARCHAR NOT NULL, user_id BIGINT NOT NULL);`),
simple(`CREATE TABLE session_check_logs (id bigserial PRIMARY KEY, ts TIMESTAMP WITH TIME ZONE NOT NULL, session_token VARCHAR NOT NULL, user_id BIGINT NOT NULL);
CREATE INDEX idx_session_check_logs_ts ON session_check_logs (ts);
CREATE INDEX idx_session_check_logs_ts_session ON session_check_logs (ts, session_token);
CREATE INDEX idx_session_check_logs_ts_user_id ON session_check_logs (ts, user_id);

BUT this would make the table quite index heavy... and while we should not need to worry about that anymore (after Copilot implements my changes to refactor the code to make use of a channel instead of an array for the log entries buffer, it will be very difficult to have the code block execution of the main thread), it still feels like this is going to become slower/more expensive as time goes on without making use of table partitioning or pruning of this table.

Either way, I think all we really need is the first index I recommended on ts, as that should filter out enough data to make the rest of the filtering very quick, even if needs to brute force the rest.

Comment on lines +175 to +181
ms1 := time.Now().UnixMilli()
for i := 0; i < maxSessions; i++ {
tracker.LogCheck(sessions[i], token)
}
ms2 := time.Now().UnixMilli()
// Verify logs exist before flush
msTotal := ms2 - ms1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad bot. Do not make use of UnixMilli, and use time.Since to get the duration instead of subtracting to integers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants