Skip to content

HTTP.base_url API #519

@ixti

Description

@ixti

This is a discussion of API that overtakes exisitng PRs (#512 and #493) in order to plan and clarify API that needs to be implemented.

First of all I propose method name to be base_uri as it better reflects behaviour. I would agree on .origin proposed by @tarcieri, but I think it might be confusing due to URI API:

HTTP::URI.parse("https://example.com/foo/bar").origin
# => "https://example.com" 

Thus I believe the best name will be base_uri. Also, this method should raise exception if schema-less URI given:

HTTP.base_uri("//example.com/users")
# !! ArgumentError: Invalid base URI: //example.com/users

HTTP.base_uri("/users")
# !! ArgumentError: Invalid base URI: /users

Also, it should allow chaining of base URIs:

HTTP.base_uri("https://example.com").base_uri("/users")
# => HTTP::Client with base URI="https://example.com/users

URI merging

IMO using Addressable's URI merging logic is straight forward:

uri = HTTP::URI.parse("https://example.com/a/b/c/")

uri.join("z")     # => https://example.com/a/b/c/z
uri.join("/z")    # => https://example.com/z
uri.join("./z")   # => https://example.com/a/b/c/z
uri.join("../z")  # => https://example.com/a/b/z

uri.join("//foobar.com/ah/a/") # => https://foobar.com/ah/a
uri.join("http://wut.wut/")    # => http://wut.wut

There's one thing that should be handled separately, and we should provide shim for this:

HTTP::URI.parse("https://example.com/foo/bar").join("./baz")
# => https://example.com/foo/baz


HTTP::URI.parse("https://example.com/foo/bar/").join("./baz")
# => https://example.com/foo/bar/baz

The above snippet, IMO, should lead to the same result (foo/bar/baz) in both cases. So we should somewhat normalize base uri to have tailing / if there's none:

def merge_uris(base, appendix)
  base = HTTP::URI.parse(base)
  base = HTTP::URI.parse("#{base}/") unless base.to_s.end_with? "/"

  base.join(appendix)
end

merge_uris("https://example.com/foo/bar", "baz")      # => https://example.com/foo/bar/baz
merge_uris("https://example.com/foo/bar", "./baz")    # => https://example.com/foo/bar/baz
merge_uris("https://example.com/foo/bar", "../baz")   # => https://example.com/foo/baz
merge_uris("https://example.com/foo/bar", "/baz")     # => https://example.com/baz

merge_uris("https://example.com/foo/bar/", "baz")     # => https://example.com/foo/bar/baz
merge_uris("https://example.com/foo/bar/", "./baz")   # => https://example.com/foo/bar/baz
merge_uris("https://example.com/foo/bar/", "../baz")  # => https://example.com/foo/baz
merge_uris("https://example.com/foo/bar/", "/baz")    # => https://example.com/baz

Security consideration

I think it's at least worth pointing out (not as a blocker on this PR, just as a general note) that path joining is an area that's fraught with peril, especially if the base URI is assumed to be "secure" but the joined parameter is attacker-controlled and the attacker can exploit either relative path behaviors or completely overwrite an existing path with an absolute one.
-- #513 (comment)

I think if developer is not validating user input - we can't prevent him from stepping a land mine. Consider this dangerous code:

HTTP.base_uri("https://example.com/foo/bar").get(user_input)

The above hides same danger as:

base_uri = HTTP::URI.parse("https://example.com/foo/bar')
HTTP.get(base_uri.join(user_input)

And of course if one is just concatenating strings he's not doing it much better either:

HTTP.get("#{base_uri}#{user_input}")

What we can do better is to allow "locking" base uri, so that it will raise exception on attempt to get unexpected URI:

client = HTTP.base_uri("https://example.com/foo/bar", :enforced => true)

client.get("baz")
# => GET https://example.com/foo/bar/baz

client.get("../baz")
# !! "https://example.com/foo/baz" is outside of base uri

How about persistent options

Something that was completely missed in opened PRs and discussions is how to handle when both persistent and base URI are given. I think this should be handled this way:

client = HTTP.base_uri("https://example.com/foo/bar")
client.persistent("https://gitlab.com")
# !! ArgumentError, persistence origin mismatch base URI

pclient = HTTP.persistent("https://gitlab.com")
pclient.base_uri("https://example.com/foo/bar")
# !! ArgumentError, base URI is outide of persistence origin

In other words we should raise an error when there's conflict between the two. And in order to make life a bit easier we can allow:

pclient = HTTP.base_uri("https://example.com/foo/bar").persistent

In other words, if there's base URI, persistent can get its origin by default.


/cc @httprb/core, @tarcieri, @bjeanes, @paul, @sj26

Overtakes #493, #512, #513

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions