Skip to content

Conversation

@jshearer
Copy link
Contributor

@jshearer jshearer commented Jan 5, 2026

Task names containing periods (e.g. foo.com/tenant/task) were being corrupted during authentication. The issue: decode_safe_name() replaces . with % then attempts percent-decoding, so foo.com becomes foo%com, and %co is invalid hex, leaving the name corrupted.

The decode_safe_name function was added in #1516 to support dot-encoded JSON config objects as usernames (e.g. {} encoded for platforms that don't allow { in usernames). When materialization task-based auth was introduced in #1665, the dot-encoding was never removed despite no longer having a valid use case: we're no longer trying to do things like stuff JSON objects into usernames.

This PR removes the decode_safe_name() call from authenticate(). It's still used by from_downstream_topic_name for decoding Kafka topic names in order to avoid breaking backwards compatibility. It also fixes a small bug in Dekaf's Kafka API client (which the e2e test infrastructure uses) that was treating an error code of -1 as non-terminal.

NOTE: This will break any user using %-encoded or .-encoded SASL usernames

@jshearer jshearer force-pushed the dekaf/fix-task-name-period-auth branch from 2a00bc8 to d754da8 Compare January 5, 2026 22:52
@jshearer jshearer changed the title WIP dekaf: remove buggy decode_safe_name call from authenticate() dekaf: Fix task names containing periods Jan 6, 2026
@jshearer jshearer self-assigned this Jan 6, 2026
@jshearer jshearer force-pushed the dekaf/fix-task-name-period-auth branch 2 times, most recently from 48329ae to b71f37c Compare January 6, 2026 19:41
@jshearer jshearer changed the base branch from master to dekaf/collection_reset_with_e2e_tests January 6, 2026 21:08
@jshearer jshearer force-pushed the dekaf/collection_reset_with_e2e_tests branch 3 times, most recently from d2b9d7f to 9de90e5 Compare January 7, 2026 22:00
@jshearer jshearer force-pushed the dekaf/fix-task-name-period-auth branch from b71f37c to e83cfb1 Compare January 7, 2026 22:26
@jshearer jshearer force-pushed the dekaf/collection_reset_with_e2e_tests branch from 4bfc69e to e1c6baf Compare January 7, 2026 22:43
@jshearer jshearer force-pushed the dekaf/fix-task-name-period-auth branch from e83cfb1 to 55be0b2 Compare January 7, 2026 22:44
Dekaf previously required TLS and MSK IAM authentication for all upstream
Kafka connections, making local development and testing difficult. This adds
support for plaintext connections via URL scheme detection:

* `tcp://host:port` connects without TLS, `tls://host:port` uses TLS (default)
* `--upstream-auth=none` flag skips SASL authentication entirely
* `KafkaClientAuth::from_msk_region(None)` creates no-auth mode

Example local usage:
  dekaf --default-broker-urls tcp://localhost:29092 --upstream-auth=none ...
…ka errors

It's possible for a collection to exist in the control plane without having
any extant journals. This can happen either when the capture task is failing or
hasn't emitted any documents, and more frequently during a collection reset.
Previously, Dekaf treated this the same as a missing collection, causing
consumers to receive non-retryable errors or inconsistent behavior.

Introduces `CollectionStatus` enum to distinguish three states:

* `Ready`: binding exists and journals are available
* `NotFound`: binding doesn't exist in the materialization spec
* `NotReady`: binding exists but journals aren't available yet

For `NotReady`, we'll use `LeaderNotAvailable` (a retryable error) to cause
consumers to retry with backoff until the journals become available.
They will eventually give up.
This is mainly for e2e tests so we can set a low TTL and
avoid waiting around for too long for changes to propagate.
* Run Dekaf e2e tests as separate step because `nexttest-run` messes with local stack state
* Make `local:data-plane` idempotent
* `ci:dekaf-e2e` now assumes `local:stack` etc are up rather than explicitly depending on it
* mise: log systemd output if failure
* mise: also log agent logs on failure
* nexttest: exclude e2e tests by default, and run them with
`--profile dekaf-e2e` instead
@jshearer jshearer force-pushed the dekaf/collection_reset_with_e2e_tests branch from e1c6baf to e819733 Compare January 8, 2026 16:11
Task names containing periods (e.g. `foo.com/tenant/task`) were being corrupted during authentication. The issue: `decode_safe_name()` replaces `.` with `%` then attempts percent-decoding, so `foo.com` becomes `foo%com`, and `%co` is invalid hex, leaving the name corrupted.

The `decode_safe_name` function was added in #1516 to support dot-encoded JSON config objects as usernames (e.g. `{}` encoded for platforms that don't allow `{` in usernames). When materialization task-based auth was introduced in #1665, the dot-encoding was never removed despite no longer having a valid use case: users provide task names via SASL auth username which doesn't have the same limitations as topic names.

This PR removes the `decode_safe_name()` call from `authenticate()`. It's still used by `from_downstream_topic_name` for decoding Kafka topic names in order to avoid breaking backwards compatibility.
The session SASL auth error check used `error_code > 0`, which only
catches positive error codes like `SaslAuthenticationFailed` (58). This
missed negative error codes like `UnknownServerError` (-1), causing the
client to think authentication succeeded when it actually failed.

* Change `error_code > 0` to `error_code != 0` to catch all non-success codes
@jshearer jshearer force-pushed the dekaf/fix-task-name-period-auth branch from 55be0b2 to 146c9b1 Compare January 8, 2026 16:11
@jshearer jshearer force-pushed the dekaf/collection_reset_with_e2e_tests branch from e819733 to 7823aab Compare January 8, 2026 20:14
@jshearer jshearer force-pushed the dekaf/collection_reset_with_e2e_tests branch 3 times, most recently from 7452c05 to 097b327 Compare January 21, 2026 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants