API Server for Docker Infrastructure


Feb 15

Daniel Stori {turnoff.us} CC BY-NC-SA 4.0

Microservice architectures can be difficult to implement. How do you reach a given service? How do you ensure that traffic is spread across all instances of that service? What happens in a cloud environment where it is normal to lose and gain service instances as a part of daily operations? How do you configure something to be able to consistently route to your service when you don’t even know where your service is running!? One possible solution to these questions is an API server. At Docker, we developed our own highly available and automated API server on top of HAProxy with deep integration with Consul. Our API server acts as a service discovery and load balancing service to ensure availability in a highly dynamic environment. Before we dive into what we have created, let’s review what service discovery and load balancing mean to us.

 

Background

Service discovery ensures that somewhere there is an accurate view of the list of services that are available for handling traffic. Anything can then ask the service discovery mechanism about other workloads in the environment. We use Consul for our service discovery. To ensure Consul is always up to date with which containers are running, we run Registrator on every node in our infrastructure.

Load balancing is a mechanism that provides a consistent and reliable way of accessing the services registered in a service discovery system. It spreads incoming requests across similar instances while ensuring that a request is always sent to a healthy service instance. To accomplish this, we use HAProxy. It allows us to set up blacklists, whitelists, peering between instances, and routing based on the Host header, or through the URL path. As an additional bonus, it provides us with a wealth of metrics to monitor the system. With the help of Consul Template, we can merge together static configuration with a dynamic list of service containers supplied by Consul.

Together, these services allow us to send all of our requests to an HAProxy cluster and let it figure out where exactly to send the request to ensure it gets fulfilled. The service sending the request doesn’t need to worry about where the target service is located, if it can reach it, and how many instances are up. All the requestor needs to know is that HAProxy will fulfill the request on its behalf.

At Docker, when we think of what an API server should do for our users, we think of the following user stories.

  • As a consuming service, I want to be able to easily contact other services, without knowing where they are located
  • As a consuming service, any request I send must not route to down containers
  • As a consuming service, any request I send must not route to down HAProxy instances
  • As a consuming service, any request I send must continue working even if an invalid HAProxy configuration is submitted
  • As a consuming service, I should notice minimal service interruption/downtime if the API server needs to apply a new configuration
  • As an engineer, I must be able to configure HAProxy so that it can route to my service
  • As an engineer, I don’t want to learn a new, in-house configuration language
  • As an engineer, I want to be able to make updates to my routing configuration with minimal friction

Top Level Architecture

To fulfill each of these user story requirements, we have created the following top-level architecture to ensure maximum uptime of the API server.

There are a few things I want to point out about this diagram. Firstly, we use Amazon Classic Elastic Load Balancers to front all of our traffic. There is a specific reason for this–we deal with HTTP and TCP traffic. We want the same API server to be able to handle web requests from one service to another, from a web UI, or from one service to a database. The newer versions of Amazon load balancers, Network Load Balancer and Application Load Balancer, do not provide this option. The other reason we use Amazon ELBs is that it provides us an easy-to-use, common access point for services to point to. Since our HAProxy cluster dynamically scales based on load, we cannot point a DNS record directly at the IP addresses of those nodes.

Secondly, note that our API server solution will route to services running almost anywhere! With Docker Enterprise Edition, we are able to run services inside of Kubernetes and Docker Swarm, but we also actively run containers without an orchestrator. Our API service solution will route to Kubernetes pods, Docker Swarm service instances, or Docker containers running on an EC2 instance–as long as Registrator is running on the node, the container will get picked up and registered in Consul.

Core Components

Configuration Automation

To begin the workflow, an engineer makes a configuration change and opens a pull request to our git repository, which is where the canonical source of configuration is–we happen to use GitHub for our git repository hosting. Any time a new commit is pushed to the master branch, we trigger a Jenkins build, which runs a script to push the latest configuration files to Amazon S3. This is an intermediary storage system which each HAProxy instance pulls from.

Then, our S3 Reload component pulls HAProxy configuration templates from the intermediary storage system. This is done on a timed schedule. Its last purpose is to put the configuration file(s) in a specific directory where Consul Template is looking for template changes.

Consul Template checks for new template changes on a timed schedule, while also checking for Consul key/value changes. It renders HAProxy templates along with some other templates that are necessary for HAProxy to run (geoip blocking, IP whitelist/blacklist, logstash configuration, etc). Finally, Consul Template puts rendered configuration files in known directories for Logstash Engine and HAProxy to use.

The star of the show, HAProxy, ensures the final, rendered configuration files are valid while also restarting the main processes that serve TCP/HTTP traffic. In v1.8 we do a seamless reload, but in all lower versions of HAProxy, we only do a reload, which can result in a minor spike in retried requests. Lastly, HAProxy also populates metrics about configuration validation and checking for our metrics pipeline to consume.

Since HAProxy only supports Syslog log output, we run Logstash with a Syslog input and ship the logs through the rest of our logging pipeline. Another important purpose of Logstash is to copy all logs to the host for parsing by Mtail, which produces a /metrics endpoint on port 3803 for Prometheus to scrape.

HAProxy Exporter also produces a /metrics endpoint on port 9101 based on metrics gathered from scraping the HAProxy statistics page. These metrics are less specific than the ones produced by Mtail and do not overlap one another.

Logrotate is part of our host-specific log management, which makes sure we don’t run out of space while also ensuring we always have the last day of logs compressed on the host in case anything were to happen with the logging pipeline.

Lastly, although not a core piece of HAProxy, Node Exporter includes the textfile collector, which allows us to add files to a directory and have those file contents be exported as an endpoint for Prometheus to scrape. This allows us to expose metrics (created by the HAProxy component) that denote when we last updated HAProxy configuration and if HAProxy configuration checks are still passing.

Migration to Containers

Our combination of HAProxy and Consul has worked well for us over the years, but like many other organizations that have numerous ongoing projects, we have fallen behind on HAProxy releases and tech-debt. We determined that we needed to move from running an unorganized collection of shell scripts to a container-centered solution with a focus on maintainability, reliability, and scalability. The goals of our migration include using our own software, Docker Enterprise Edition, to run one of the most important pieces of our infrastructure while also gaining increased agility to make changes to any component in the configuration automation pipeline.

Most of the components work largely in the same way they worked before, but a few major changes make this setup more resilient to change.

Firstly, this unit of computing, labeled “POD” in the above diagram, is a Kubernetes Pod deployed as a Kubernetes DaemonSet. Kubernetes provides controllers to ensure the service as a whole is always up while also keeping a nice log of what happens during deploys.

Secondly, each component only cares about an input and an output, whereas before a process might restart another process to kick off the next step in the configuration automation pipeline. We could have accomplished this in a few different ways, but we decided to move to a pull-based model of execution. Most of the inputs and outputs are files located in shared volumes. In a few scenarios, an HTTP endpoint is the contract between two containers in the pod. These changes make it easier to debug because an operator can look at the state the files are in to figure out where a problem might have occurred. Also, making changes to the system are easier because the contract between components is explicit.

Lastly, HAProxy is upgraded to 1.8, which now supports a truly seamless reload!

Conclusion

Registrator, Consul, Consul Template, HAProxy, Mtail, HAProxy Exporter, Node Exporter, Logstash, Prometheus… At Docker, all of these technologies have been woven together to provide a cohesive, automated, dynamic, and highly available API server. This solution uses a common configuration language (HAProxy configuration language) along with Go templates (Consul Template) to allow for a self-service routing and load balancing platform for engineers at Docker. It also ensures minimal downtime in the case of failure or poor configuration of any piece of the pipeline. HAProxy was also one of the last pieces of our infrastructure that needed to be converted to running in Docker containers. We are now able to make changes to our load balancing and service discovery system quicker than before, which saves us time and energy so that we may innovate on other parts of the infrastructure.