forked from kubeedge/sedna
-
Notifications
You must be signed in to change notification settings - Fork 1
GM HA 方案设计 #6
Copy link
Copy link
Open
Description
GM HA方案设计
1. GM功能
- 各个边云协同资源Operators的管理器
- GM与LC通信模块:
- downstream: GM将边云协同资源对象同步到指定的LC消息
- upstream: LC将边云协同资源对象更新操作同步给GM, GM再更新到k8s api
GM HA方案主要考虑上述两个模块的HA
2. k8s社区HA模式玩法
k8s HA分两类:
- 无状态模块的HA: 前面加一个负载均衡器, 如k8s-api-server
- 有状态HA:
- 通过raft算法选主, 如etcd
- 使用client-go的leader-election模块, 如kube-scheduler, kube-controller-manager
- 业务集成client-go的leader-election模块
- sidercar模式(已经于2018年不维护了): 将client-go的leader-election模块运行为一个sidecar, 业务只需访问sidecar指定端口进行判断, doc, code
以下是一些社区方案的调研
2.1. HA of kube-controller-manager: active-standby
直接集成leader-election模块进行选主, 获取到lease锁的实例执行其业务逻辑: 对k8s api server进行list/watch, 然后处理deployment/job/daemonset等资源
比如在一个实际环境, 其lease对象:kubectl get leases -n kube-system kube-controller-manager -o yaml
apiVersion: coordination.k8s.io/v1
kind: Lease
metadata:
name: kube-controller-manager
namespace: kube-system
spec:
acquireTime: "2021-05-29T17:07:58.743009Z"
holderIdentity: kind-control-plane2_c2181a4f-abaf-473c-b799-46e87ca46af6
leaseDurationSeconds: 15
leaseTransitions: 1
renewTime: "2021-05-31T03:53:29.519058Z"2.2. HA support of CloudCore:
kubeedge/kubeedge#1560
kubeedge/kubeedge#1569
kubeedge/kubeedge#1600
2.2.1 active-standby
- 选主方案: cloudcore直接集成leader-election模块
- 保证edgecore连接的cloudcore是主:
- keepalived: check cloudcore 10002端口的/readyz rest接口进行vip选择, 如果是主返回OK
- k8s 原生的load-balance: 设置podReadiness
2.2.1 active-active
kubeedge/kubeedge#1560 (comment)
我理解edgecore可连接任意cloudcore实例, cloudcore实例只处理其上已经连接的edgenode消息
3. GM HA
3.1 协同资源Operators的HA
active-standby: 由于是对crd list-watch并且update, 可参考kube-controller-manager
active-active: 无法做到??
3.2 GM与LC通信模块的HA
3.2.1 active-standby
参考cloudcore的active-standby模式, LC只连接主GM
3.2.2 active-active
参考cloudcore的active-active模式, LC可连接任意GM实例, GM只处理其上已经连接的LCs消息
3.2.3 另外一种思路, GM/LC通信通过api-server中转
由于kubeedge 边侧提供list-watch的能力(见Autonomic Kube-API Endpoint(AKE))
- GM -> LC: GM 写cr数据到api-server, LC 通过 AKE watch到变化
- LC -> GM: LC 通过AKE update 资源状态, GM 通过k8s-api-server watch变化
利:
- 无需维护LC-> GM 长连接
弊:
- 由于AKE监听localhost, 需LC运行在host network上
- LC都会list-watch所有边云协同资源对象, 会造成广播风暴, edgecore性能瓶颈(能否用field-selector减轻此问题?)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels

