-
Notifications
You must be signed in to change notification settings - Fork 14
feature: support k8s service discovery. #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
f307f7d to
fe28e1c
Compare
bdb78a8 to
5f533ea
Compare
b974eb4 to
05c0f1e
Compare
| @@ -25,6 +25,7 @@ import ( | |||
| "google.golang.org/grpc/codes" | |||
| "google.golang.org/grpc/status" | |||
|
|
|||
| managertypes "github.com/aigw-project/aigw/pkg/aigateway/clustermanager/types" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not a good idea to reuse the static discovery, introduce a new implementation should be better.
For the code reusing we may introduce a common xds server to avoding copying code, it could be another PR as the second step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @Wangdai-0800
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. We abstract the xds server into independent modules and implement two class "StaticClusterProvider" & "DynamicClusterProvider" based on the "BaseClusterInfoProvider".
05c0f1e to
09c33f1
Compare
|
@Wangdai-0800 it's not a good idea to use force push, it makes review changes harder. |
|
|
||
| // updata the snapshot form istio | ||
| func (p *StaticClusterProvider) AutoUpdateFromPilot(nodeID string, interval time.Duration) { | ||
| go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AutoUpdateFromPilot function lacks an exit mechanism. It is recommended to use a for+select loop to ensure it can exit at least when context.Done() is triggered.
| @@ -100,6 +101,28 @@ func (s *cdsServerImpl) processAllClusters() { | |||
| s.responseChan <- resp | |||
| } | |||
|
|
|||
| func (s *cdsServerImpl) processSubscribedClusters(subscribeClusters []string) { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some INFO logs in the key steps to facilitate issue troubleshooting.
09c33f1 to
f62648c
Compare
| if err != nil { | ||
| api.LogErrorf("failed to pull from pilot: %v", err) | ||
| } | ||
| time.Sleep(interval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need sleep here, we should do it by realtime subscribing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Remove the sleep. Could there be an asynchronous implementation?just like epoll_wait() works.
| start-aigw-k8s-pod: | ||
| kind load docker-image aigw-image/aigw:v1 --name aigw-llm-service | ||
| ISTIO_K8S_HOST := "istiod.istio-system.svc.cluster.local" | ||
| cat etc/envoy-istio.yaml \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use "sed -i.bak -n xx etc/envoy-istio.yaml" directly rather than "cat etc/envoy-istio.yaml | sed xx", "-i.bak" will create one backup file.
| GetAllClusters() []*ClusterInfo | ||
| } | ||
|
|
||
| type BaseClusterInfoProvider struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best practice is using doc comments for all exported functions, types, and packages to provider clear documentation. Please reference https://go.dev/doc/comment.
| defaultNodeId = "router~127.0.0.1~aigw.default~cluster.local" | ||
| ) | ||
|
|
||
| type StaticEndpoint struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move Static* to static_provider.go?
| } | ||
|
|
||
| // updata the snapshot form istio | ||
| func (p *DynamicClusterProvider) AutoUpdateFromPilot(nodeID string, interval time.Duration) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interval is not used. Do you mean to try every internal until subscribe successfully?
| p.WarmupReady.Store(false) | ||
| p.AutoUpdateFromPilot(defaultNodeId, 10*time.Second) | ||
| // wait for the first pull of xds to complete | ||
| for p.WarmupReady.Load() == false { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commonly, sync.WaitGroup is used to wait go routine to stop.
| grpc.WithTransportCredentials(insecure.NewCredentials()), | ||
| ) | ||
| if err != nil { | ||
| return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should log in every failure case?
No description provided.