libp2p + HTTP an exploratory refactor#102
Conversation
| publisherHost, err := libp2phttp.New( | ||
| libp2phttp.StreamHost(publisherStreamHost), | ||
| libp2phttp.ListenAddrs([]multiaddr.Multiaddr{multiaddr.StringCast("/ip4/127.0.0.1/tcp/0/http")}), | ||
| ) | ||
| req.NoError(err) | ||
|
|
||
| go publisherHost.Serve() | ||
| defer publisherHost.Close() |
There was a problem hiding this comment.
This is the "HTTP Host". It's like the libp2p "stream host" (aka core host.Host), but it uses HTTP semantics instead of stream semantics.
You can pass in options on creation like a stream host to do HTTP over libp2p streams, and multiaddrs to create listeners on.
| serverHTTPMa := publisherHost.Addrs()[1] | ||
| req.Contains(serverHTTPMa.String(), "/http") | ||
|
|
||
| publisherHost.SetHttpHandlerAtPath(protoID, "/ipni/", http.StripPrefix("/ipni/", publisher)) |
There was a problem hiding this comment.
Here is where we attach our request handler. Note that we are mounting the "/ipni-sync/1" protocol at /ipni/. libp2phttp manages this mapping and clients can learn about the mapping at .well-known/libp2p.
In this case we also want out HTTP handler to not even know about the prefix, so we use the stdlib http.StripPrefix.
dagsync/httpsync/publisher_test.go
Outdated
| clientHost, err := libp2phttp.New() | ||
| req.NoError(err) | ||
|
|
||
| c, err := clientHost.NamespacedClient(nil, protoID, peer.AddrInfo{Addrs: []multiaddr.Multiaddr{serverHTTPMa}}) |
There was a problem hiding this comment.
This creates an http.Client that automatically knows about the prefix for a given protocol ID.
dagsync/httpsync/publisher_test.go
Outdated
| clientHost, err := libp2phttp.New() | ||
| req.NoError(err) | ||
|
|
||
| c, err := clientHost.NamespacedClient(clientStreamHost, protoID, peer.AddrInfo{ID: publisherStreamHost.ID(), Addrs: []multiaddr.Multiaddr{serverStreamMa}}) |
There was a problem hiding this comment.
This is the same thing as above but it is going over a libp2p stream.
Note: I think I'll change the NamespacedClient api to not accept a stream host here, and instead use the one passed in as an option to libp2phttp.New.
| } | ||
|
|
||
| for _, tc := range testCases { | ||
| t.Run(tc.name, func(t *testing.T) { |
There was a problem hiding this comment.
Nothing special here. Just plumbing to set up the test.
|
Neat, but it seems to me that the main trade-off here is that we have to rely on libp2p discovery semantics to figure out how to communicate? a client has to be I really like that the code is simpler in here. But unfortunately it's also only simpler if we get to remove the other form of this - but if we do that, then users on both ends need to opt-in to the libp2p version of talking on http. Either using this library, or writing it themselves. One nice thing about the current incarnation is that HTTP is simple, and writing the server (and client) is fairly straightforward, with all of the complexity being in the ipni protocol itself. This would add an additional layer of complexity. |
|
Thanks for taking a look!
This is just one way to discover a peer's protocol mapping. IPNI decided on encoding this mapping in the multiaddr (which is fine), but I'd prefer not to have every application have reimplement this. I think it makes sense to have this mapping at the server metadata layer rather than defined per application/protocol. It allows the protocols to be agnostic to their path prefix, gives peers a standard way they can learn about each other, and it's an HTTP native resource. That doesn't mean this is the only solution to create a peer's protocol mapping. IPNI can continue encoding the mapping in the multiaddr, and give that information to the HTTP host. That's not supported right now, but I think that's a great addition to the API (edit: just pushed this change).
I don't think this is a "libp2p version of talking on http". A goal here is to interoperate with existing systems. This is all plain old HTTP. Yes, we're defining the If this Server: mux.HandleFunc("/.well-known/libp2p", func(w http.ResponseWriter, r *http.Request) {
fmt.Fprintln(w, `{"protocols":{"/ipni-sync/1":{"path":"/custom-prefix"}}}`)
})Client: const prefix = await fetch("<remote>/.well-known/libp2p")
.then(resp => resp.json())
.then(mapping => mapping.protocols["/ipni-sync/1"].path)Alright, so you may ask "if it's so easy what's the point of libp2p+HTTP then?" A couple of reasons, but for IPNI, the biggest benefit is that it lets you delete This API is all subject to change, so if you see something that could improve here I'm all ears! |
|
Overall seems like a positive API addition to libp2p http, it does weaken the argument for It makes me ponder the possibilities for protocol probing, we do all this juggling in lassie to figure out how we can talk to peers, some of it depends on inspecting the multiaddr and some involves opening a stream and probing: https://github.com/filecoin-project/lassie/blob/main/pkg/retriever/directcandidatefinder.go Potentially, if the trustless gateway http protocol were to adopt this, we could do some versioning and make allowances for different root paths. We're already at the point of exploring a mix of OPTIONS, Accept headers, and params to do some of the work around different implementation functionality support. I can imagine this getting out of hand once we add more features to the protocol. I'm not sure what you have here would really solve any of this better, but maybe if we needed to do hard versioning breaks it would help. We're also stuck with the fixed |
gammazero
left a comment
There was a problem hiding this comment.
Thank you so much for writing this. This is very helpful for understanding how this can be used to make libp2p work easily as a transport for HTTP. Thank you!
I have some questions concerning the best way to adapt this for use in IPNI http-sync in the PR review.
| clientHost, err := libp2phttp.New() | ||
| req.NoError(err) | ||
|
|
||
| c, err := clientHost.NamespacedClient(protoID, peer.AddrInfo{Addrs: []multiaddr.Multiaddr{serverHTTPMa}}) |
There was a problem hiding this comment.
This appears to mean that a separate NamespacedClient is needed for each separate server that a request is sent to. This is somewhat different than the semantics of the standard http.Client where the same client instance can be used to send requests to servers at different URLs. Is this correct?
This matters because with the structure of the httpsync code as it currently is, the sync is a long-lived object (created once) that contains the http.Client which is reused for separate Syncer requests. What will need to change is that a new NamespacedClient will need to be associated with each per-sync Syncer instance and not with the Sync.
Can NamespacedClient instances be closed and removed from their clientHost?
There was a problem hiding this comment.
Just a quick note: Connections are managed by the underlying Roundtripper. NamespacedClients are a thin wrapper around an underlying RoundTripper that adds a prefix path to http requests. They don't change how connections are managed or cached.
This matters because with the structure of the httpsync code as it currently is, the sync is a long-lived object (created once) that contains the http.Client which is reused for separate Syncer requests. What will need to change is that a new NamespacedClient will need to be associated with each per-sync Syncer instance and not with the Sync.
Yep that makes sense to me. 👍
an NamespacedClient instances be closed and removed from their clientHost?
Not necessary to remove them from the clientHost (the client http host doesn't track them). You may optionally close the idle connections on them with client.CloseIdleConnections, but it's not necessary. The HTTP transport or stream host is responsible for managing/closing these connections. Clients are relatively cheap to make.
| urls: nil, | ||
| sync: s, | ||
| }, nil | ||
| } |
There was a problem hiding this comment.
A Syncer instance is created for each sync operation and is responsible for making a request to a specific advertisement publisher. This means that each instance of aSyncer requires the addresses to connect to since these will not be known in advance.
I think instead of NewSyncerWithoutAddrs here, that NewSyncer should be modified to create a new clientHost.NamespacedClient if its parent Sync is using a libp2phttp client. The NamespacedClient can then be used as the http.Client in Syncer functions.
Will it be necessary to close each NamespacedClient instance and remove it from the clientHost and remove its peer info from the peerstore? Will that happen automatically after some time?
There was a problem hiding this comment.
I think instead of NewSyncerWithoutAddrs here, that NewSyncer should be modified to create a new clientHost.NamespacedClient if its parent Sync is using a libp2phttp client. The NamespacedClient can then be used as the http.Client in Syncer functions.
👍 . I thought about this change, but wanted to make a less invasive change to demo. I like it though.
Will it be necessary to close each NamespacedClient instance and remove it from the clientHost and remove its peer info from the peerstore? Will that happen automatically after some time?
Happens automatically :) (just like the stock http.Client does it)
This is another version of the libp2p + HTTP refactor in #102. The key difference is that a new `NamespacedClient`, having the publisher's address, is created for each sync within the per-sync Syncer.
This is another version of the libp2p + HTTP refactor in #102. The key difference is that a new `NamespacedClient`, having the publisher's address, is created for each sync within the per-sync Syncer.
|
@MarcoPolo Please take a look at PR #103 as a modification of this PR, and see PR comments within code. cc @masih |
|
Replying to @rvagg comments first:
Long term, I think we'll push this data closer to the "provider record" equivalent. You should know what datatransfer protocols a peer supports (and in the case of HTTP, where they are mounted). I believe the IPNI metadata already has some support for this (extra metadata for the application protocol, which if it is an HTTP-semantic one can include the prefix path). That would save us a round trip and possible a wasted connection when we want to learn about a new peer. Note that this doesn't conflict with a
I wrote a comment around I think (I'm not super familiar here) that we could mount the existing |
|
This package is incorporated into the changes in #113. Closing. |
Hi folks!
I'm play-testing the new libp2p+HTTP api libp2p/go-libp2p#2438 so I wanted to try using it somewhere. I think the dagsync code here would benefit from libp2p+HTTP because:
dagsync/dtsynceventually (after clients upgrade).a. This means your client and server are also simpler. You can toss all the prefix logic into the trash!
I've added a test case that shows a simple head fetch and sync over HTTP both on top of a native HTTP transport and a libp2p stream.
I'd love feedback on what you think about this API.
cc @masih