How to hot deploy Docker containers thanks to Swarm and Overlay Networks

The goal of this blog is to show how, thanks to Docker 1.9, Swarm and Overlay Networks, we can dynamically deploy containers without cascading restarts and service disruptions.

Context

We were looking for a solution to do seamless continuous delivery of the Timetrack application. This former is running behind a NodeJS proxy server, in a Docker environment hosted on two internet servers. Two others applications (website and blog) are running the same way:

existing architecture

Problem: with the running version of Docker (1.7), each time we deploy a new version of a proxied application, we have to restart the proxy container to allow it to discover new IP addresses.

proxy container is launched like this:

$ docker run -d  --link website:website --link ghost:ghost  --link timetrack:timetrack --name proxy -p 80:80 -v "/opt/proxy":/opt -w /opt node sh run 

Cascading restart is certainly the main concern in Docker actual ecosystem: timetrack use mongoDb (not in diagram), so mongo container is linked with timetrack and timetrack is linked with proxy, so if we restart mongo we have to restart timetrack then proxy!

The general solution to avoid cascading restarts is usually called Service Registration and Discovery. I dream of it as being fully packaged with Docker and transparent for the application layer.

Many solutions exist yet, outside of Docker, essentially to help containers or/and services registration, whereas "discovery" is usually made via DNS:

Good news, Docker 1.9 coupled with Swarm 1.0 offer an overlay network feature that will allow us to satisfy our need. Not yet to discover services but at least to register and discover container's name and IP address. We will not be able to express that the website service is backed by srv1/website:8080 and srv2/website:8080 (this is certainly the next step, keep reading our blog ...) but the use of an overlay network instead of the traditional Docker's bridge will offer unique IP addresses and names for containers over cluster's members and up to date /etc/hosts files in each container connected to the overlay network.

In our context, containers proxy and timetrack will now be launched like this:

$ docker run -d --net overlay_network --name proxy -p 80:80 -v "/opt/proxy":/opt -w /opt node sh run
$ docker run -d --net overlay_network --name timetrack redpelicans/timetrack

We can now restart timetrack, proxy hosts files will be automatically updated by Docker with timetrack's last IP address without service downtime.

Our target is to enable hot deployments, so we need to provide an overlay network to our containers on our two hosts (this is not the best architecture, but the one we have). To deploy an overlay network we need a Key/Value store and a Swarm cluster. Because the two servers are exposed to internet we have to build a private tunnel between both (this step is out of the scope, google for ipip tunnel for documentation). The main goal is not to explain how to use a Docker cluster, we will just use it.

Here's come the dark side of this blog entry: installation of the solution. We felt in love with Docker due to its simplicity: install Docker engines on hosts (apt-get install) and then just pull and run, pull and run, and that's all. Forget this time, the target architecture will imply at least a more complexe installation process, but we are entering in the magical world of clusters where all solutions should be tailored to needs, so rarely bundled.

Here is the target solution:

existing architecture

Steps for installation:

  • Install key/value cluster
  • launch Docker in cluster mode
  • Setup a Swarm cluster
  • Create an overlay network
  • Run our containers on new setup
Install the Key/value store

A Docker cluster needs to share information between nodes, this is why we have to use a KV store. Docker offers actually to use 3 solutions:

We will use etcd (you can google for many studies that compare the 3 solutions). etcd seems to be the simplest and fit exactly to our needs (I tried consul first, but was enable to instruct Docker to multi connect to each cluster's members but just to one at a time).

etcd will be used by the Docker Engine, Swarm Agent and Swarm Manager. We have 2 nodes, etcd use the Raft consensus algorithm to manage a highly-available cluster, so we need 3 instances (see doc). We will deploy 2 of them directly on hosts (we need 2 running instances to start a Docker engine) and one as a container.

Use systemd to launch etcd daemon on both hosts. (ex for srv1)

# cat /etc/systemd/system/multi-user.target.wants/etcd.service
[Unit]
Description=ETCD  
After=network.target

[Service]
Type=notify  
ExecStart=/usr/etcd/etcd -name etcd1 \  
    -data-dir /var/lib/etcd \
    -initial-advertise-peer-urls http://10.100.100.1:2380 \
    -listen-peer-urls http://10.100.100.1:2380 \
    -listen-client-urls http://10.100.100.1:2379,http://127.0.0.1:2379 \
    -advertise-client-urls http://10.100.100.1:2379 \
    -initial-cluster-token etcd-cluster \
    -initial-cluster \
         etcd1=http://10.100.100.1:2380,\
        etcd2=http://10.100.100.2:2380,\
        etcd11=http://10.100.100.1:12380 \
    -initial-cluster-state new
Restart=on-failure

Same for srv2, just advertise on the right IP address (10.100.100.2).

Check your cluster's health:

$ curl http://10.100.100.1:2379/health
{"health": "true"}
$ curl http://10.100.100.2:2379/health
{"health": "true"}
Run Docker Engine in cluster mode

Docker daemons must now listen to network(private network of course!) and connect to KV store nodes.

srv1 setup:

$ cat /etc/systemd/system/multi-user.target.wants/docker.service 
[Unit]
Description=Docker Application Container Engine  
Documentation=https://docs.docker.com  
After=network.target docker.socket  
Requires=network.target docker.socket

[Service]
Type=notify  
ExecStart=/usr/bin/docker daemon \  
    -H tcp://10.100.100.1:2375 \
    -H unix:///var/run/docker.sock \
    --cluster-store=etcd://10.100.100.1:2379,10.100.100.2:2379,10.100.100.1:12379 \
    --cluster-advertise=10.100.100.1:2375 
TimeoutStartSec=300

Restart docker services and jump to next step.

Setup a Swarm cluster

On each host we have to install an agent and a manager (this is not a general rule, but just because we have only 2 hosts, and try to get high availability in Docker swarm cluster).

First we need to install a docker agent on each host. This time go back to Docker: we just have to run a container with few options:

$ docker run --name swarm_agent --restart=always -d swarm join --addr=10.100.100.1:2375 etcd://10.100.100.1:2379/nodes

Second launch one manager per host:

$ docker run --name swarm_manager --restart=always -p 10.100.100.1:4000:4000 -d swarm manage -H :4000 --replication --advertise 10.100.100.1:4000 etcd://10.100.100.1:2379/nodes

Here we are, cluster is up, check it:

$ docker -H tcp://10.100.100.1:4000 info
Containers: 24  
Images: 41  
Role: replica  
Primary: 10.100.100.2:4000  
Strategy: spread  
Filters: health, port, dependency, affinity, constraint  
Nodes: 2  
 rp1: 10.100.100.1:2375
  └ Status: Healthy
  └ Containers: 16
  └ Reserved CPUs: 0 / 8
  └ Reserved Memory: 0 B / 16.44 GiB
  └ Labels: executiondriver=native-0.2, kernelversion=4.2.0-22-generic, operatingsystem=Ubuntu 15.10, storagedriver=btrfs
 rp2: 10.100.100.2:2375
  └ Status: Healthy
  └ Containers: 8
  └ Reserved CPUs: 0 / 8
  └ Reserved Memory: 0 B / 16.44 GiB
  └ Labels: executiondriver=native-0.2, kernelversion=4.2.0-22-generic, operatingsystem=Ubuntu 15.10, storagedriver=btrfs
CPUs: 16  
Total Memory: 32.89 GiB  
Name: bddfaeb57ef0  

On the other host:

$ docker -H tcp://10.100.100.2:4000 info
Containers: 24  
Images: 41  
Role: primary  
Strategy: spread  
Filters: health, port, dependency, affinity, constraint  
Nodes: 2  
...
Create an overlay network

Let's create a network called swarm spread over our 2 nodes:

$ DOCKER_HOST=tcp://10.100.100.2:4000 docker create network swarm

Check it:

$ DOCKER_HOST=tcp://10.100.100.2:4000 docker network ls
NETWORK ID          NAME                  DRIVER  
9d04c7b04a1c        srv2/bridge            bridge  
8a869a678dc2        srv1/bridge            bridge  
051844332bde        swarm                 overlay  
Run our containers

Remember, we have to deploy something like 3 depending containers, let's say A -> B -> C.
In the previous version of Docker the solution was to do :

$ docker run -d C
$ docker run -d --link C:C B
$ docker run -d --link B:B A

And we all know the tragic end of this story when we have to restart C!

In our new architecture we will launch our containers like this (choose your preferred order):

$ docker run -d --net=swarm C
$ docker run -d --net=swarm A
$ docker run -d --net=swarm B

And yes here it is, they can discover each other, we can even restart C without restarting A and B, and A can still discover C address! This is true for a single but also multi hosts environment.

We have now to start few containers to finalize our journey:

etcd11: remember we need 3 etcd instance to be fault tolerant with at least 2 running nodes. etcd11 is not bind to swarm network. We use a swarm filter to lock etcd11 to srv1.

$ docker run -d --restart=always \
    --name=etcd11 \
    --env="constraint:node==srv1" \
    -v /opt/etcd:/data \
    -p 10.100.100.1:12380:12380 \
    -p 10.100.100.1:12379:12379 \
    quay.io/coreos/etcd \
        -name etcd11 \
        -data-dir /data \
        -initial-advertise-peer-urls http://10.100.100.1:12380 \
        -listen-peer-urls http://0.0.0.0:12380 \
        -listen-client-urls http://0.0.0.0:12379,http://127.0.0.1:2379 \
        -advertise-client-urls http://10.100.100.1:12379 \
        -initial-cluster-token etcd-cluster \
        -initial-cluster\
            etcd1=http://10.100.100.1:2380,\
            etcd11=http://10.100.100.1:12380,\
            etcd2=http://10.100.100.2:2380 \
        -initial-cluster-state new

proxy, timetrack, ghost, website: names now are unique accross the cluster, we are not using Docker Compose, so we have to run manually 2 series of each containers on each host. Each proxy should have peers containers (timetrack, ghost, website) names hard coded (params not code!). Each container is run like this:

on srv2:

$ docker run -d --restart=always \
    --name=proxy2 \
    --net=swarm \
    --env="constraint:node==rp2" \
    -p 80:80 \
    -v "/opt/proxy":/opt \
    -w /opt \
    node sh run
Conclusion

We faced many problems at installation time to stabilize the whole architecture. Many times we had to remove /var/run/docker/netns and /var/lib/docker/network, reset etcd data, and restart docker daemons. It seams that etcd data where not always up to date, and starting containers or using overlay network was not possible. We are now in production for few days without any trouble and will start to deploy automatically containers each time a new git hub version is available in master/hotfix branch. Stay tuned ...

Hope this blog entry may help.

That's all folks.