Wednesday, January 18, 2017

InfraKit Under the Hood: High Availability [feedly]

InfraKit Under the Hood: High Availability

-- via my feedly newsfeed

Back in October, Docker released Infrakit, an open source toolkit for creating and managing declarative, self-healing infrastructure. This is the first in a two part series that dives more deeply into the internals of InfraKit.


At Docker,  our mission to build tools of mass innovation constantly challenges to look at ways to improve the way developers and operators work. Docker Engine with integrated orchestration via Swarm mode have greatly simplified and improved efficiency in application deployment and management of microservices. Going a level deeper, we asked ourselves if we could improve the lives of operators by making tools to simplify and automate orchestration of infrastructure resources. This led us to open source InfraKit, as set of building blocks for creating self-healing and self-managing systems.

There are articles and tutorials (such as this, and this) to help you get acquainted with InfraKit. InfraKit is made up of a set of components which actively manage infrastructure resources based on a declarative specification. These active agents continuously monitor and reconcile differences between your specification and actual infrastructure state. So far, we have implemented the functionality of scaling groups to support the creation of a compute cluster or application cluster that can self-heal and dynamically scale in size. To make this functionality available for different infrastructure platforms (e.g. AWS or bare-metal) and extensible for different applications (e.g. Zookeeper or Docker orchestration), we support customization and adaptation through the instance and flavor plugins. The group controller exposes operations for scaling in and out and for rolling update and communicates with the plugins using JSON-RPC 2.0 over HTTP. While the project provides packages implemented in Go for building platform-specific plugins (like this one for AWS), it is possible to use other language and tooling to create interesting and compatible plugins.

High Availability

Because InfraKit is used to ensure the availability and scaling of a cluster, it needs to be highly available and perform its duties without interruption.  To support this requirement, we consider the following:

  1. Redundancy & Failover — for active management without interruption.
  2. Infrastructure State — for an accurate view of the cluster and its resources.
  3. User specification — keeping it available even in case of failure.

Redundancy & Failover

Running multiple sets of the InfraKit daemons on separate physical nodes is an obvious approach to achieving redundancy.  However, while multiple replicas are running, only one of the replica sets can be active at a time. Having at most one leader (or master) at any time ensures no multiple controllers are independently making decisions and thus end up conflicting with one another while attempting to correct infrastructure state. However, with only one active instance at any given time, the role of the active leader must transition smoothly and quickly to another replica in the event of failure. When a node running as the leader crashes, another set of InfraKit daemons will assume leadership and attempt to correct the infrastructure state. This corrective measure will then restore the lost instance in the cluster, bringing the cluster back to the desired state before outage.

There are many options in implementing this leadership election mechanism. Popular coordinators for this include Zookeeper and Etcd which are consensus-based systems in which multiple nodes form a quorum. Similar to these is the Docker engine (1.12+) running in Swarm Mode, which is based on SwarmKit, a native clustering technology based on the same Raft consensus algorithm as Etcd. In keeping with the goal of creating a toolkit for building systems, we made these design choices:

  1. InfraKit only needs to observe leadership in a cluster: when the node becomes the leader, the InfraKit daemons on that node become active. When leadership is lost, the daemons on the old leader are deactivated, while control is transferred over to the InfraKit daemons running on the new leader.
  2. Create a simple API for sending leadership information to InfraKit. This makes it possible to connect InfraKit to a variety of inputs from Docker Engines in Swarm Mode (post 1.12) to polling a file in a shared file system (e.g. AWS EFS).
  3. InfraKit does not itself implement leader election. This allows InfraKit to be readily integrated into systems that already have its own manager quorum and leader election such as Docker Swarm. Of course, it's possible to add leader election using a coordinator such as Etcd and feed that to InfraKit via the leadership observation API.

With this design, coupled with a coordinator, we can run InfraKit daemons in replicas on multiple nodes in a cluster while ensuring only one leader is active at any given time. When leadership changes, InfraKit daemons running on the new leader must be able to assess infrastructure state and determine the delta from user specification.

Infrastructure State

Rather than relying on an internal, central datastore to manage the state of the infrastructure, such as an inventory of all vm instances, InfraKit aggregates and computes the infrastructure state based on what it can observe from querying the infrastructure provider. This means that:

  1. The instance plugin needs to transform the query from the group controller to appropriate calls to the provider's API.
  2. The infrastructure provider should support labeling or tagging of provisioned resources such as vm instances.
  3. In cases where the provider does not support labeling and querying resources by labels, the instance plugin has the responsibility to maintain that state. Approaches for this vary with plugin implementation but they often involve using services such as S3 for persistence.

Not having to store and manage infrastructure state greatly simplified the system. Since the infrastructure state is always aggregated and computed on-demand, it is always up to date. However, other factors such as availability and performance of the platform API itself can impact observability. For example, high latencies and even API throttling must be handled carefully in determining the cluster state and consequently deriving a plan to push toward convergence with the user's specifications.

User Specification

InfraKit daemons continuously observe the infrastructure state and compares that with the user's specification. The user's specification for the cluster is expressed in JSON format and is used to determine the necessary steps to drive towards convergence. InfraKit requires this information to be highly available so that in the event of failover, the user specification can be accessed by the new leader.

There are options for implementing replication of the user specification. These range from using file systems backed by persistent object stores such as S3 to EFS to using distributed key-value store such as Zookeeper or Etcd. Like other parts of the toolkit, we opted to define an interface with different implementations of this configuration store. In the repo, there are stores implemented using file system and Docker Swarm. More implementations are possible and we welcome contributions!


In this article, we have examined some of the considerations in designing InfraKit. As a systems meant to be incorporated as a toolkit into larger systems, we aimed for modularity and composability. To achieve these goals, the project specifies interfaces which define interactions of different subsystems. As a rule, we try to provide different implementations to test and demonstrate these ideas. One such implementation of high availability with InfraKit leverages Docker Engine in Swarm Mode — the native clustering and orchestration technology of the Docker Platform — to give the swarm self-healing properties. In the next installment, we will investigate this in greater detail.

Check out the InfraKit repository README for more info, a quick tutorial and to start experimenting — from plain files to Terraform integration to building a Zookeeper ensemble. Have a look, explore, and send us a PR or open an issue with your ideas!

More Resources:

#InfraKit Under the Hood: High Availability #docker
Click To Tweet


The post InfraKit Under the Hood: High Availability appeared first on Docker Blog.