Cluster Node Automation in Docker Enterprise


Jun 21

At Docker, Inc. the infrastructure team works hard to ensure every piece of our job is automated. We don’t start with manual work and upgrade it to be automated later–we automate from day one. This enables a small team like ourselves to accomplish much more than would otherwise be possible. Though, automation usually has a hefty initial cost with benefits to come over time. In an effort to help you understand why we automate and influence you to begin (or advance) your automation efforts, we would like to share what we have done with respect to our EC2 instance/node infrastructure.

Amazon Machine Images and Resource Provisioning

Amazon EC2 is where we have spent a lot of our automation efforts. For us, that starts with Packer. We have a git repository that holds all of the configurations for our Packer builds. It does all of the OS level package management updates and configures some services that are available on all of our EC2 instances, no matter what type of workload they run, which includes Saltstack, Consul, Unbound, and Node Exporter. After building our AMIs, we tag them so that we can roll them out selectively.

Once Packer is finished, the AMIs are consumed by AWS CloudFormation by looking for specific tags based on the environment. We use CloudFormation in order to provision new resources or do updates to those resources. If the resource is native to AWS (EC2 instance, Redis Elasticache cluster, Postgres RDS instance, etc), we create it with CloudFormation, besides a few exceptions. As part of creating an EC2 instance, we follow two main rules. Never create an EC2 instance outside of an EC2 Auto Scaling Group and always set tags on resources to denote what its purpose is.

EC2 instances created outside of an auto-scaling group are hard to track and ephemeral. If someone were to accidentally delete the node or the node crashed due to host hardware failure, how do you bring it back up? You would need to re-apply the cloud formation stack or manually recreate the EC2 instance. That is error-prone and time-consuming, even if it’s automated via CloudFormation. With auto-scaling groups, we don’t need to worry about that because Amazon monitors all EC2 instances created by an auto scaling group. If an EC2 instance goes down and the auto-scaling group does not meet the expected number of running EC2 instances, AWS will create another instance on your behalf with your predefined configuration! Perfect!

Our second rule of CloudFormation is that the EC2 tags describe what each node does. It can describe the environment it is a part of, like “us-east-1” (our production environment) or “<user>-us-east-1” (a personal playground environment). We also heavily use tags called “role” and “secondary-role”. A role of “swarm” and secondary-role of “manager” means that the EC2 instance is a manager of the Docker Enterprise cluster. A role/secondary-role combination of “swarm/infra” means that node is part of the cluster and in the Swarm Collection or Kubernetes namespace named “infra”, depending on the orchestrator. You can see an example of some EC2 instances and their tagging in the screenshot below.

AWS EC2 Instance Tagging

AWS EC2 Instance Tagging

EC2 Instance Configuration

Once a node boots up, it needs to do many things to get ready to interact with the rest of the cluster and run workloads. We use EC2 UserData in order to bootstrap Saltstack itself and have it initially talking to the Saltstack master. Then Saltstack configuration steps in. It is the final step of bootstrapping an EC2 instance.

Let us imagine we have two nodes with role/secondary-role set to “swarm/infra” and “swarm/manager”. For the first node, Saltstack would recognize that it is part of the Docker Enterprise cluster because of the role being set to “swarm”. As such, it would install Docker Engine and configure the daemon.json file with all of the tags on the EC2 instance as Docker Engine labels. Lastly, it would see the secondary-role is set to “infra”, so it would join the cluster as a worker node.

For the second node, it would start similarly with a Docker Engine install and daemon.json file configuration. After that, Saltstack would see that it has a secondary-role of “manager” and treat it a bit differently. It would proceed to join that node to the cluster as a manager while also configuring a cronjob for taking backups, setting up certificates, initializing the Docker Enterprise cluster if it isn’t set up already, and ensuring the Swarm tokens are correctly shared with the rest of the cluster.

All of this is automated though, so we don’t need to care about doing that every time. We have the luxury of assuming it is done and being alerted only if it is not done. It works so well we could go into the AWS Console and start killing off machines and walk away–it’s fun, you should try it! New nodes will be launched automatically and configure themselves to replace the missing old nodes.

Infrastructure Services

As you can see, Saltstack does a good job of allowing us to customize single nodes or nodes in a group, but we can also customize all of our nodes across our infrastructure the same way. We utilize this by running containers for logging (Logstash, Logrotate), monitoring (cAdvisor, Consul Exporter), and routing (Registrator) on every single one of our EC2 instances. This means that no matter where a node is located, or what its purpose is, when it launches it will be registered in Consul and our Prometheus server will automatically scrape it for metrics. It will automatically have containers on the node registered in Consul. All logs from running containers will automatically be shipped through our logging pipeline to a central, searchable location. All of this is done automatically.

Air Boss

Finally, our node automation would not be complete if we didn’t mention Air Boss. In fact, since Docker Enterprise 2.0, we could not run our infrastructure in a highly automated and available way without it! Part of Docker Enterprise is Universal Control Plane (UCP), which is a cluster management plane for managing applications and services in a single place. With the addition of Kubernetes orchestrator support to UCP, we now need the ability to switch between Docker Swarm and Kubernetes on a node based on a configurable Docker Engine label.

### filter nodes into buckets of Kubernetes or Swarm
def _organize_nodes(self):
    self.global_nodes = self.ucp_api.get_nodes()
    self.global_nodes = ucp_utils.filter_out_blacklisted_nodes(self.global_nodes, self.blacklist)

    for node in self.global_nodes:
        hostname = node['Description']['Hostname']
        engine_labels = node['Description']['Engine']['Labels']
        is_kubernetes_node = ucp_utils.is_kubernetes_node(engine_labels)
        is_swarm_node = ucp_utils.is_swarm_node(engine_labels)

        logger.debug('hostname: %r, engine_labels: %r, is_kubernetes_nodes: %r, is_swarm_node: %r',
            hostname, engine_labels, is_kubernetes_node, is_swarm_node)

        if is_kubernetes_node:
            self.kube_nodes.append(node)
        elif is_swarm_node:
            self.swarm_worker_nodes.append(node)

Air Boss also uses the “secondary-role” label to ensure that nodes are added to the correct Kubernetes Namespace, optionally creating it if it doesn’t exist. It also ensures that labels are setup correctly to ensure UCP sees this node as a Kubernetes node and schedules workloads in the namespace to a node with these labels.

def _kube_worker(self):
    # set up collections, orchestrator, and kube labels for kube nodes
    for node in self.kube_nodes:
        hostname = node['Description']['Hostname']

        try:
            logger.debug("Kube node: %r", hostname)

            # get node spec
            spec = node['Spec']
            version = str(node['Version']['Index'])
            secondary_role = node['Description']['Engine']['Labels']['secondary-role']
            collection_name = Airboss.get_collection_name(self.collections, secondary_role)

            # sanitize the collection mane and 
            # if the collection for secondary-role does not exist, create it
            self._ensure_collection_accuracy(collection_name, secondary_role)

            # check labels for kube orchestration and collection
            is_kube = ("com.docker.ucp.orchestrator.kubernetes" in spec['Labels']) \
                and (spec['Labels']['com.docker.ucp.orchestrator.kubernetes'] == 'true')
            is_swarm = ("com.docker.ucp.orchestrator.swarm" in spec['Labels']) \
                and (spec['Labels']['com.docker.ucp.orchestrator.swarm'] == 'true')
            in_collection = ("com.docker.ucp.access.label" in spec['Labels']) \
                and (spec['Labels']['com.docker.ucp.access.label'] == "/"+collection_name)
            # if kube not enabled, swarm is enabled, or collection not correct - update the node
            if (not is_kube) or is_swarm or (not in_collection):
                logger.info("Swarm node labels not set correctly for Kube node - setting now. Node: %r", hostname)
                spec['Labels']['com.docker.ucp.orchestrator.kubernetes'] = 'true'
                spec['Labels']['com.docker.ucp.orchestrator.swarm'] = 'false'
                spec['Labels']['com.docker.ucp.access.label'] = "/"+collection_name
                self.ucp_api.patch_node(node['ID'], version, spec)

            # confirm kube nodes labels for collection
            kube_labels = self.ucp_api.get_kube_node(hostname)['metadata']['labels']
            collection_label = "com.docker.ucp.collection."+collection_name
            patch_node_payload = {}
            if (collection_label not in kube_labels) or (kube_labels[collection_label] != "true"):
                logger.info("Kube node labels not set correctly - setting now. node: %r", hostname)
                patch_node_payload = {
                    "metadata": {
                        "labels": {
                            collection_label: "true"
                        }
                    }
                }

            if self.sync_ec2_metadata:
                self._sync_ec2_metadata_to_patch_node_payload(node, patch_node_payload, self.aws_ec2_instances)

            if patch_node_payload:
                self.ucp_api.patch_kube_node(hostname, patch_node_payload)

            # confirm that kube namespace matching secondary-role exists
            self._ensure_namespace_accuracy(collection_name)

            # confirm there is a scheduler annotation for that namespace to use the collection label
            namespace = self.ucp_api.get_kube_namespace(collection_name)
            if (not "annotations" in namespace['metadata']) \
                    or (not "scheduler.alpha.kubernetes.io/node-selector" in namespace['metadata']['annotations']) \
                    or (namespace['metadata']['annotations']['scheduler.alpha.kubernetes.io/node-selector'] != "com.docker.ucp.collection."+collection_name+"=true"):
                logger.info("Adding annotations for namespace %r. Node: %r", collection_name, hostname)
                payload = {"metadata": {"annotations": {"scheduler.alpha.kubernetes.io/node-selector": "com.docker.ucp.collection."+collection_name+"=true"}}}
                self.ucp_api.patch_kube_namespace(collection_name, payload)
        except Exception as error:
            logger.error("Error updating the kube node, going to keep moving. "
                "hostname: %r, error: %r, traceback: %r", hostname, error, traceback.format_exc())

It does the same thing for Swarm enabled nodes, but instead of using namespaces, it uses UCP Collections.

def _swarm_worker(self):
    for node in self.swarm_worker_nodes:
        hostname = node['Description']['Hostname']

        try:
            logger.debug("Swarm Worker node: %r", hostname)

            # refresh our collection list in case one was created previously
            self.collections = self.ucp_api.get_collections()

            spec = node['Spec']
            version = str(node['Version']['Index'])
            secondary_role = node['Description']['Engine']['Labels']['secondary-role']
            collection_name = Airboss.get_collection_name(self.collections, secondary_role)

            self._ensure_collection_accuracy(collection_name, secondary_role)

            # check labels for kube/swarm orchestration and collection
            is_kube = ("com.docker.ucp.orchestrator.kubernetes" in spec['Labels']) and (spec['Labels']['com.docker.ucp.orchestrator.kubernetes'] == 'true')
            is_swarm = ("com.docker.ucp.orchestrator.swarm" in spec['Labels']) and (spec['Labels']['com.docker.ucp.orchestrator.swarm'] == 'true')
            in_collection = ("com.docker.ucp.access.label" in spec['Labels']) and (spec['Labels']['com.docker.ucp.access.label'] == "/"+collection_name)
            # if kube enabled, swarm is not enabled, or collection not correct - update the node
            if is_kube or (not is_swarm) or (not in_collection):
                logger.info("Swarm node labels not set correctly for Swarm node - setting now")
                spec['Labels']['com.docker.ucp.orchestrator.kubernetes'] = 'false'
                spec['Labels']['com.docker.ucp.orchestrator.swarm'] = 'true'
                spec['Labels']['com.docker.ucp.access.label'] = "/"+collection_name
                self.ucp_api.patch_node(node['ID'], version, spec)
        except Exception as error:
            logger.error("Error updating the swarm node, going to keep moving. "
                          "hostname: %r, error: %r, traceback: %r", hostname, error, traceback.format_exc())

Lastly, it will automatically reap nodes from the cluster if they are no longer running in AWS.

def _remove_dead_nodes(self):
    for node in self.global_nodes:
        if node['Status']['State'] == 'down':
            hostname = node['Description']['Hostname']
            node_id = node['ID']
            private_ip_address = node['Status']['Addr']

            if not self.aws.does_node_exist(private_ip_address, hostname):
                logger.info('Node found to be down in UCP and does not exist in AWS '
                    'EC2. Removing node from UCP. hostname: %r, node_id: %r', hostname, node_id)
                self.ucp_api.remove_node(node_id)
            else:
                logger.info('Found node to still be in EC2, leaving in UCP. '
                    'hostname: %r, node_id: %r', hostname, node_id)

All of these features together provide a fully automated platform for managing our clusters. Look, ma’, no hands!

Conclusion

At Docker, we use a lot of technologies but above all, we strive to automate everything. Our small infrastructure team doesn’t have time to manually do every piece of work, so we enable every machine in our infrastructure to help us get our job done efficiently. Without this automation, we would be over-burdened with customer requests and technical debt, never allowing us to innovate on new projects.

Shipping Dock