Skip to content

Conversation

@Wangdai-0800
Copy link
Member

No description provided.

@Wangdai-0800 Wangdai-0800 force-pushed the service_discover branch 3 times, most recently from f307f7d to fe28e1c Compare November 12, 2025 13:34
@Wangdai-0800 Wangdai-0800 force-pushed the service_discover branch 6 times, most recently from bdb78a8 to 5f533ea Compare November 13, 2025 07:09
@Wangdai-0800 Wangdai-0800 force-pushed the service_discover branch 5 times, most recently from b974eb4 to 05c0f1e Compare November 14, 2025 08:34
@@ -25,6 +25,7 @@ import (
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"

managertypes "github.com/aigw-project/aigw/pkg/aigateway/clustermanager/types"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not a good idea to reuse the static discovery, introduce a new implementation should be better.
For the code reusing we may introduce a common xds server to avoding copying code, it could be another PR as the second step.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. We abstract the xds server into independent modules and implement two class "StaticClusterProvider" & "DynamicClusterProvider" based on the "BaseClusterInfoProvider".

@doujiang24
Copy link
Member

@Wangdai-0800 it's not a good idea to use force push, it makes review changes harder.


// updata the snapshot form istio
func (p *StaticClusterProvider) AutoUpdateFromPilot(nodeID string, interval time.Duration) {
go func() {
Copy link
Contributor

@amazingapple amazingapple Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AutoUpdateFromPilot function lacks an exit mechanism. It is recommended to use a for+select loop to ensure it can exit at least when context.Done() is triggered.

@@ -100,6 +101,28 @@ func (s *cdsServerImpl) processAllClusters() {
s.responseChan <- resp
}

func (s *cdsServerImpl) processSubscribedClusters(subscribeClusters []string) {
Copy link
Contributor

@amazingapple amazingapple Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some INFO logs in the key steps to facilitate issue troubleshooting.

if err != nil {
api.LogErrorf("failed to pull from pilot: %v", err)
}
time.Sleep(interval)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need sleep here, we should do it by realtime subscribing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Remove the sleep. Could there be an asynchronous implementation?just like epoll_wait() works.

@Wangdai-0800 Wangdai-0800 reopened this Dec 1, 2025
start-aigw-k8s-pod:
kind load docker-image aigw-image/aigw:v1 --name aigw-llm-service
ISTIO_K8S_HOST := "istiod.istio-system.svc.cluster.local"
cat etc/envoy-istio.yaml \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use "sed -i.bak -n xx etc/envoy-istio.yaml" directly rather than "cat etc/envoy-istio.yaml | sed xx", "-i.bak" will create one backup file.

GetAllClusters() []*ClusterInfo
}

type BaseClusterInfoProvider struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best practice is using doc comments for all exported functions, types, and packages to provider clear documentation. Please reference https://go.dev/doc/comment.

defaultNodeId = "router~127.0.0.1~aigw.default~cluster.local"
)

type StaticEndpoint struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move Static* to static_provider.go?

}

// updata the snapshot form istio
func (p *DynamicClusterProvider) AutoUpdateFromPilot(nodeID string, interval time.Duration) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interval is not used. Do you mean to try every internal until subscribe successfully?

p.WarmupReady.Store(false)
p.AutoUpdateFromPilot(defaultNodeId, 10*time.Second)
// wait for the first pull of xds to complete
for p.WarmupReady.Load() == false {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commonly, sync.WaitGroup is used to wait go routine to stop.

grpc.WithTransportCredentials(insecure.NewCredentials()),
)
if err != nil {
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should log in every failure case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants