Skip to content

URL-encoded characters in external JAR paths cause Label creation to fail #50

@runchen0919

Description

@runchen0919

URL-encoded characters in external JAR paths cause Label creation to fail

Problem Description

When processing runtime classpath JARs from external repositories (e.g., Maven dependencies hosted on private artifact servers), the JAR paths may contain URL-encoded characters such as %40 (encoded @ symbol). Creating a Bazel Label with these encoded characters fails validation because % is not a valid character in Bazel target names.

Example Case

A JAR path from an external repository:

bazel-out/darwin_arm64-fastbuild/bin/external/minimal_j11_deps/v1/https/username%40artifacts.example.net/repository/maven/javax/activation/javax.activation-api/1.2.0/javax.activation-api-1.2.0.jar

Contains %40 (URL-encoded @) in the artifact server hostname username@artifacts.example.net.

Current Behavior

When creating a synthetic label for unknown JARs at line 198-199 in JavaAspectsInfo.java:

targetLabel = Label.create(
    format("@_unknown_jar_//:%s", classJar.getRelativePath()));

The Label.create() method throws an IllegalArgumentException because:

  • The % character is not in the allowed characters for Bazel target names (see TargetName.ALLOWED_META)
  • Only these special characters are allowed: +, _, ,, =, -, ., @, ~, #, , (, ), $, !

Expected Behavior

The path should be URL-decoded before being used in a Label:

  • %40@
  • %2F/
  • Other URL-encoded characters → their decoded equivalents

This allows the Label to be created successfully while maintaining proper indexing and lookup functionality.

Root Cause

External repository paths from Maven or other artifact sources may contain URL-encoded characters when:

  1. Artifact server hostnames contain special characters (e.g., user@server.com)
  2. Repository paths contain spaces or special characters
  3. Bazel's external repository mechanism preserves these encoded paths

Proposed Solution

Add a sanitizePathForLabel() method to URL-decode the path before Label creation:

private static String sanitizePathForLabel(String path) {
    try {
        return URLDecoder.decode(path, "UTF-8");
    } catch (UnsupportedEncodingException e) {
        LOG.warn("Failed to URL-decode path '{}', using original path", path, e);
        return path;
    } catch (IllegalArgumentException e) {
        LOG.warn("Path '{}' contains invalid URL encoding, using original path", path, e);
        return path;
    }
}

Then use it when creating the synthetic label:

targetLabel = Label.create(
    format("@_unknown_jar_//:%s", sanitizePathForLabel(classJar.getRelativePath())));

Impact Analysis

Why this fix is safe:

  1. Index separation: JAR lookups use classJar.getRelativePath() (with URL encoding preserved) via the libraryByJdepsRootRelativePath map, which is separate from the Label
  2. Label as metadata: The Label is only used as an identifier in TargetKey for:
    • Dependency graph construction
    • Logging and debugging
    • Deduplication of dependencies
  3. No path operations: The synthetic label @_unknown_jar_// is never used for file system operations

Benefits:

  • ✅ Allows Label creation to succeed for URL-encoded paths
  • ✅ Improves log readability (decoded paths are more human-readable)
  • ✅ Maintains correct indexing functionality
  • ✅ No impact on JAR lookup logic

Affected Code Locations

  • Primary: JavaAspectsInfo.java line 198-199
  • Similar case: JavaAspectsInfo.java line 225 (same pattern in error handling branch)

Workarounds

Currently, users may encounter build failures when:

  • Using private Maven repositories with special characters in hostnames
  • Using external repositories with URL-encoded paths
  • No known workarounds exist without this fix

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions