Skip to content
This repository was archived by the owner on Jan 26, 2023. It is now read-only.

Commit 9a9d44b

Browse files
Update allocations no longer require that healthy nodes be destroyed (#16)
Currently, adding an additional node to an existing Redis Cluster would change the flags being passed to the `attache-control` sidecar task for every existing allocation. The Nomad scheduler would (correctly) trigger a destructive update (e.g. reallocate, stop, migrate, and restart) of each existing Redis Cluster node even though they were already healthy. This is because the Nomad scheduler can only update an allocation in-place when there are no attributes (environment variables, file templates, etc.) relevant to any of that job's tasks being updated. This PR updates `attache-control` to fetch these counts from a Consul KV path instead. To ensure consistency between, the scaling configuration stored in the Consul and the total number of nodes in the Nomad job specification, I've added a Terraform file that sets both of them. For more information, see the updated README. Lastly, we're approaching the point where more folks may begin to touch this code so I've also taken the time to comment up my exported structs, fields, and methods.
1 parent 41ffb2a commit 9a9d44b

File tree

13 files changed

+576
-360
lines changed

13 files changed

+576
-360
lines changed

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,3 +48,7 @@ attache-control
4848

4949
# Mac
5050
*.DS_Store
51+
52+
# Terraform
53+
*.terraform*
54+
*.tfstate*

README.md

Lines changed: 34 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ once they've joined a cluster.
2525
#### Usage
2626
```shell
2727
$ attache-check -help
28-
Usage of ./attache-check:
28+
Usage of attache-check:
2929
-check-serv-addr string
3030
address this utility should listen on (e.g. 127.0.0.1:8080)
3131
-redis-auth-password-file string
@@ -49,11 +49,11 @@ An ephemeral sidecar that acts as an agent for each Redis node when it's
4949
started. If a node's `node info` reflects that of a new node, this agent will
5050
attempt to introduce it to an existing Redis Cluster, if it exists, else it will
5151
attempt to orchestrate the create a new Redis Cluster if there are enough new
52-
Redis Nodes (in the Await Consul Service) to do so.
52+
Redis nodes (in the Await Consul Service) to do so.
5353

5454
#### Usage
5555
```shell
56-
$ attache-control -help
56+
$ ./attache-control -help
5757
Usage of ./attache-control:
5858
-attempt-interval duration
5959
Duration to wait between attempts to join or create a cluster (default 3s)
@@ -87,10 +87,6 @@ Usage of ./attache-control:
8787
Redis username, (required)
8888
-redis-node-addr string
8989
redis-server listening address, (required)
90-
-redis-primary-count int
91-
Total number of expected Redis shard primary nodes, (required)
92-
-redis-replica-count int
93-
Total number of expected Redis shard replica nodes
9490
-redis-tls-ca-cert string
9591
Redis client CA certificate file, (required)
9692
-redis-tls-cert-file string
@@ -100,29 +96,52 @@ Usage of ./attache-control:
10096
```
10197

10298
### Running the Example Nomad Job
103-
Note: these steps assume that you have the `nomad` and `consul` binaries installed
104-
on your machine and that they exist in your `PATH`.
99+
Note: these steps assume that you have the `nomad`, `consul`, and `terraform`
100+
binaries installed on your machine and that they exist in your `PATH`.
105101

106102
Build the attache-control and attache-check binaries:
107103
```shell
108104
$ go build -o attache-check ./cmd/attache-check/main.go && go build -o attache-control ./cmd/attache-control/main.go ./cmd/attache-control/config.go
109105
```
110106

111-
Start the Consul server in `dev` mode:
107+
In another shell, start the Consul server in `dev` mode:
112108
```shell
113109
$ consul agent -dev -datacenter dev-general -log-level ERROR
114110
```
115111

116-
Start the Nomad server in `dev` mode:
112+
In another shell, start the Nomad server in `dev` mode:
117113
```shell
118114
$ sudo nomad agent -dev -bind 0.0.0.0 -log-level ERROR -dc dev-general
119115
```
120116

121-
Start a Nomad job deployment:
117+
Start a Nomad job deployment using Terraform:
122118
```shell
123-
$ nomad job run -verbose -var-file=./example/vars-file.hcl ./example/job-specification.hcl
119+
cd example
120+
terraform init
121+
terraform plan
122+
terraform apply
124123
```
125124

126-
Open the Nomad UI: http://localhost:4646/ui
125+
Open the Nomad UI: http://localhost:4646/ui to view information about the Redis
126+
Cluster deployment
127127

128-
Open the Consul UI: http://localhost:8500/ui
128+
Open the Consul UI: http://localhost:8500/ui to view health check information
129+
for the Redis Cluster
130+
131+
### Useful Commands
132+
133+
#### Purge Nomad Job
134+
This is useful for stopping and garbage collecting a job in Nomad immediately.
135+
```shell
136+
nomad job stop -purge "<jobname>"
137+
```
138+
139+
#### Count Primary Nodes
140+
```shell
141+
redis-cli -p <tls-port> --tls --cert ./example/tls/redis/cert.pem --key ./example/tls/redis/key.pem --cacert ./example/tls/ca-cert.pem --user replication-user --pass <redis-password> cluster nodes | grep master | wc -l
142+
```
143+
144+
#### Count Replica Nodes
145+
```shell
146+
redis-cli -p <tls-port> --tls --cert ./example/tls/redis/cert.pem --key ./example/tls/redis/key.pem --cacert ./example/tls/ca-cert.pem --user replication-user --pass <redis-password> cluster nodes | grep slave | wc -l
147+
```

cmd/attache-check/main.go

Lines changed: 28 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,17 @@ import (
1515
logger "github.com/sirupsen/logrus"
1616
)
1717

18-
// CheckHandler is a wraps an inner redis client with some methods for handling
19-
// health check requests.
18+
// CheckHandler wraps an inner redis client and provides a method for handling a
19+
// health check request from Consul. It's exported for use with with a request
20+
// router.
2021
type CheckHandler struct {
2122
redis.Client
2223
}
2324

25+
// StateOK handles health checks from Consul. A 200 response from this handler
26+
// means that, from this Redis Cluster node's perspective, the Redis Cluster
27+
// State is OK and Consul can begin advertising this node as part of the Redis
28+
// Cluster in the Service Catalog.
2429
func (h *CheckHandler) StateOk(w http.ResponseWriter, r *http.Request) {
2530
clusterInfo, err := h.GetClusterInfo()
2631
if err != nil {
@@ -52,11 +57,29 @@ func main() {
5257
logger.Fatal("Missing required opt 'check-serv-addr'")
5358
}
5459

55-
err := redisOpts.Validate()
56-
if err != nil {
57-
logger.Fatal(err)
60+
if redisOpts.NodeAddr == "" {
61+
logger.Fatal("missing required opt: 'redis-node-addr'")
62+
}
63+
64+
if redisOpts.Username == "" {
65+
logger.Fatal("missing required opt: 'redis-auth-username'")
66+
}
67+
68+
if redisOpts.PasswordFile == "" {
69+
logger.Fatal("missing required opt: 'redis-auth-password-file'")
5870
}
5971

72+
if redisOpts.CACertFile == "" {
73+
logger.Fatal("missing required opt: 'redis-tls-ca-cert'")
74+
}
75+
76+
if redisOpts.CertFile == "" {
77+
logger.Fatal("missing required opt: 'redis-tls-cert-file'")
78+
}
79+
80+
if redisOpts.KeyFile == "" {
81+
logger.Fatal("missing required opt: 'redis-tls-key-file'")
82+
}
6083
logger.Infof("starting %s", os.Args[0])
6184

6285
router := mux.NewRouter()

cmd/attache-control/config.go

Lines changed: 91 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -9,57 +9,109 @@ import (
99
r "github.com/letsencrypt/attache/src/redis/config"
1010
)
1111

12-
// CLIOpts is exported for use with flag.Parse().
13-
type CLIOpts struct {
14-
RedisOpts r.RedisOpts
15-
RedisPrimaryCount int
16-
RedisReplicaCount int
17-
LockPath string
18-
AttemptInterval time.Duration
19-
AttemptLimit int
20-
AwaitServiceName string
21-
DestServiceName string
22-
LogLevel string
23-
ConsulOpts c.ConsulOpts
24-
}
12+
// cliOpts contains all of the configuration used to orchestrate the Redis
13+
// Cluster under management by Attaché.
14+
type cliOpts struct {
15+
// lockPath is the Consul KV path to use as a leader lock for Redis Cluster
16+
// operations.
17+
lockPath string
2518

26-
func (c CLIOpts) Validate() error {
27-
if c.RedisPrimaryCount == 0 {
28-
return errors.New("missing required opt: 'redis-primary-count'")
29-
}
19+
// attemptInterval is duration to wait between attempts to join or create a
20+
// cluster.
21+
attemptInterval time.Duration
22+
23+
// attemptLimit is the number of times to attempt joining or creating a cluster before Attache
24+
// should exit as failed.
25+
attemptLimit int
26+
27+
// awaitServiceName is the name of the Consul Service that newly created
28+
// Redis Cluster nodes will join when they're first started but have yet to
29+
// form or join a cluster. This field is required.
30+
awaitServiceName string
31+
32+
// destServiceName is the name of the Consul Service that Redis Cluster
33+
// nodes will join once they are part of a cluster. This field is required.
34+
destServiceName string
35+
36+
// logLevel is the level that Attaché should log at.
37+
logLevel string
3038

31-
if c.DestServiceName == "" {
39+
// RedisOpts contains the configuration for interacting with the node this
40+
// serves as a sidecar to and, if one exists, the Redis Cluster. This field
41+
// is required.
42+
RedisOpts r.RedisOpts
43+
44+
// ConsulOpts contains the configuration for interacting with the Consul
45+
// cluster that Attaché uses for leader lock and to retrieve the scaling
46+
// options in the Consul KV store. This field is required.
47+
ConsulOpts c.ConsulOpts
48+
}
49+
50+
// Validate checks that the required opts for `attache-control` were passed via
51+
// the CLI. User friendly errors are returned when this is not the case.
52+
func (c cliOpts) Validate() error {
53+
if c.destServiceName == "" {
3254
return errors.New("missing required opt: 'dest-service-name'")
3355
}
3456

35-
if c.AwaitServiceName == "" {
57+
if c.awaitServiceName == "" {
3658
return errors.New("missing required opt: 'await-service-name'")
3759
}
3860

39-
err := c.ConsulOpts.Validate()
40-
if err != nil {
41-
return err
61+
if c.ConsulOpts.EnableTLS {
62+
if c.ConsulOpts.TLSCACertFile == "" {
63+
return errors.New("missing required opt: 'consul-tls-ca-cert")
64+
}
65+
66+
if c.ConsulOpts.TLSCertFile == "" {
67+
return errors.New("missing required opt: 'consul-tls-cert")
68+
}
69+
70+
if c.ConsulOpts.TLSKeyFile == "" {
71+
return errors.New("missing required opt: 'consul-tls-key")
72+
}
73+
}
74+
75+
if !c.ConsulOpts.EnableTLS && (c.ConsulOpts.TLSCACertFile != "" || c.ConsulOpts.TLSCertFile != "" || c.ConsulOpts.TLSKeyFile != "") {
76+
return errors.New("missing required opt: 'consul-tls-enable")
77+
}
78+
79+
if c.RedisOpts.NodeAddr == "" {
80+
return errors.New("missing required opt: 'redis-node-addr'")
81+
}
82+
83+
if c.RedisOpts.Username == "" {
84+
return errors.New("missing required opt: 'redis-auth-username'")
85+
}
86+
87+
if c.RedisOpts.PasswordFile == "" {
88+
return errors.New("missing required opt: 'redis-auth-password-file'")
89+
}
90+
91+
if c.RedisOpts.CACertFile == "" {
92+
return errors.New("missing required opt: 'redis-tls-ca-cert'")
93+
}
94+
95+
if c.RedisOpts.CertFile == "" {
96+
return errors.New("missing required opt: 'redis-tls-cert-file'")
4297
}
4398

44-
err = c.RedisOpts.Validate()
45-
if err != nil {
46-
return err
99+
if c.RedisOpts.KeyFile == "" {
100+
return errors.New("missing required opt: 'redis-tls-key-file'")
47101
}
48102
return nil
49103
}
50104

51-
func ParseFlags() CLIOpts {
52-
var conf CLIOpts
105+
func ParseFlags() cliOpts {
106+
var conf cliOpts
53107

54108
// CLI
55-
flag.IntVar(&conf.RedisPrimaryCount, "redis-primary-count", 0, "Total number of expected Redis shard primary nodes, (required)")
56-
flag.IntVar(&conf.RedisReplicaCount, "redis-replica-count", 0, "Total number of expected Redis shard replica nodes")
57-
flag.StringVar(&conf.LockPath, "lock-kv-path", "service/attache/leader", "Consul KV path used as a distributed lock for operations")
58-
flag.DurationVar(&conf.AttemptInterval, "attempt-interval", 3*time.Second, "Duration to wait between attempts to join or create a cluster")
59-
flag.IntVar(&conf.AttemptLimit, "attempt-limit", 20, "Number of times to attempt joining or creating a cluster before exiting")
60-
flag.StringVar(&conf.AwaitServiceName, "await-service-name", "", "Consul Service for newly created Redis Cluster Nodes, (required)")
61-
flag.StringVar(&conf.DestServiceName, "dest-service-name", "", "Consul Service for healthy Redis Cluster Nodes, (required)")
62-
flag.StringVar(&conf.LogLevel, "log-level", "info", "Set the log level")
109+
flag.StringVar(&conf.lockPath, "lock-kv-path", "service/attache/leader", "Consul KV path to use as a leader lock for Redis Cluster operations")
110+
flag.DurationVar(&conf.attemptInterval, "attempt-interval", 3*time.Second, "Duration to wait between attempts to join or create a cluster (e.g. '1s')")
111+
flag.IntVar(&conf.attemptLimit, "attempt-limit", 20, "Number of times to attempt for or join a cluster before exiting")
112+
flag.StringVar(&conf.awaitServiceName, "await-service-name", "", "Consul Service for newly created Redis Cluster Nodes, (required)")
113+
flag.StringVar(&conf.destServiceName, "dest-service-name", "", "Consul Service for healthy Redis Cluster Nodes, (required)")
114+
flag.StringVar(&conf.logLevel, "log-level", "info", "Set the log level")
63115

64116
// Redis
65117
flag.StringVar(&conf.RedisOpts.NodeAddr, "redis-node-addr", "", "redis-server listening address, (required)")
@@ -73,10 +125,10 @@ func ParseFlags() CLIOpts {
73125
flag.StringVar(&conf.ConsulOpts.DC, "consul-dc", "dev-general", "Consul client datacenter")
74126
flag.StringVar(&conf.ConsulOpts.Address, "consul-addr", "127.0.0.1:8500", "Consul client address")
75127
flag.StringVar(&conf.ConsulOpts.ACLToken, "consul-acl-token", "", "Consul client ACL token")
76-
flag.BoolVar(&conf.ConsulOpts.EnableTLS, "consul-tls-enable", false, "Enable mTLS for the Consul client")
77-
flag.StringVar(&conf.ConsulOpts.TLSCACert, "consul-tls-ca-cert", "", "Consul client CA certificate file")
78-
flag.StringVar(&conf.ConsulOpts.TLSCert, "consul-tls-cert", "", "Consul client certificate file")
79-
flag.StringVar(&conf.ConsulOpts.TLSKey, "consul-tls-key", "", "Consul client key file")
128+
flag.BoolVar(&conf.ConsulOpts.EnableTLS, "consul-tls-enable", false, "Enable mTLS for the Consul client (requires 'consul-tls-ca-cert', 'consul-tls-cert', 'consul-tls-key')")
129+
flag.StringVar(&conf.ConsulOpts.TLSCACertFile, "consul-tls-ca-cert", "", "Consul client CA certificate file")
130+
flag.StringVar(&conf.ConsulOpts.TLSCertFile, "consul-tls-cert", "", "Consul client certificate file")
131+
flag.StringVar(&conf.ConsulOpts.TLSKeyFile, "consul-tls-key", "", "Consul client key file")
80132

81133
flag.Parse()
82134
return conf

0 commit comments

Comments
 (0)