Michael Blum

Developer from Chicago

Ansible In A Box


DevOps @ LogicGate

As a backend engineer at LogicGate I’ve been tasked with creating a DevOps workflow for provisioning our applications on AWS in an automated way. Previously, we used a combination of CloudFormation templates, shell scripts, and a bit of Node. I wanted to a process that makes it easy for developers to test on a laptop but also easy to deploy to EC2 instances.

A foray into DevOps

Before getting into DevOps formally I used a combination of Chef scripts, shell scripts, and Vagrant to do my DevOps. This had a few drawbacks:

  1. Need top run an entire VM
  2. Deployment was hard - needed Chef Zero or some other deployment cocktail
  3. Took a long time to stand up and destroy an image
  4. Couldn’t run multiple copies of an image
  5. Very hand-rolled

I liked the simplicity of Ansible - SSH into a machine and you’re in business. At first i tried using a Vagrant box to deploy and test a playbook. This had some of the similar drawbacks from when I used Chef in lieu of Ansible.

DevOps in a Box

Docker gives us a contained environment (network, ports, software versions, etc) along with the ability to stand up multiple copies of an instance for clustering and testing closer to what’s in production. One may argue that I could just write a Dockerfile and be done with it. This would be true if I was running actual containers in production. So far Amazon’s Elastic Container Service is still in a fledgling state and I prefer installing against raw EC2 instances. I would also say Docker’s storage and volumes are less well-known / reliable than a regular EBS volume.

I caveat this when using ElasticBeanstalk - the single container implementations of Docker are pretty well-developed: Single Docker containers on ElasticBeanStalk. This is how we run our Java apps in ElasticBeanStalk and allows a developer to run the stack with a simple docker-compose.

Combining a real DevOps / provisioning tool (like Ansible) allows an operator to use Ansible’s extensive DSL for streamlining operations in a Linux environment such as: user creation, permissions, config file edits, etc.

To run Ansible in a Docker container I needed to do a few things that are generally considered a bad idea:

  1. install an SSH service
  2. Open multiple running processes
  3. Bake in files / configs

Best practices as they pertain to containers seem to be:

  1. No SSH - use Docker networking / bridges
  2. One process per container
  3. Dynamic files should be mounted as volumes

I did the Docker image this way to have the Docker container look as similar to a real VPS instance - SSH access, Python, etc.

ssh_config

I found the trickiest part was configuring SSH in a Docker container. I accomplished this with a custom ssh_config:

# This is the ssh client system-wide configuration file.  See
# ssh_config(5) for more information.  This file provides defaults for
# users, and the values can be changed in per-user configuration files
# or on the command line.

# Configuration data is parsed as follows:
#  1. command line options
#  2. user-specific file
#  3. system-wide file
# Any configuration value is only changed the first time it is set.
# Thus, host-specific definitions should be at the beginning of the
# configuration file, and defaults at the end.

# Site-wide defaults for some commonly used options.  For a comprehensive
# list of available options, their meanings and defaults, please see the
# ssh_config(5) man page.

Host *
    StrictHostKeyChecking no
    ServerAliveInterval 30
    ServerAliveCountMax 5
    TCPKeepAlive        yes
    ControlMaster       auto
    ControlPath         /root/.ssh/mux-%r@%h:%p
    ControlPersist      15m
    ConnectTimeout      60

ansible.cfg

I also took a slightly unorthodox Ansible file structure and needed to add some extra configurations so Ansible would know where to look:

# Set any ansible.cfg overrides in this file.
# See: https://docs.ansible.com/ansible/intro_configuration.html#explanation-of-values-by-section

[defaults]
host_key_checking = False
hash_behaviour = merge
roles_path = /etc/ansible/roles
inventory = /etc/ansible/hosts

I want to point out a subtle detail here:

hash_behaviour = merge

By default Ansible will overwrite a map of values such that if a YAML in a role looks like this:

env:
  name: test

and the playbook has:

env:
  name: test 
  vpc: local

This lead to some perplexing lack of variables ub that only env.name would remain. The merge stetting allows for both env.name and env.vpc to co-exist.

Here is my generic Dockerfile for running Ansible in a container:

Dockerfile

#
# Ansible Dockerfile
#
# Create an Ansible runtime and execute the playbook.yml
#

# Pull base image.
FROM ubuntu:16.04

# Install Ansible dependencies
RUN apt-get update \
    && apt-get upgrade -y \
    && apt-get install -y \
    build-essential \
    ca-certificates \
    gcc \
    libssl-dev \
    libffi-dev \
    python-pip \
    python2.7 \
    python2.7-dev \
    python-netaddr \
    sudo \
    ssh \
    curl \
    cron \
    htop \
    aptitude \
    rsyslog

# clean up APT install
RUN apt-get autoremove \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

# Environment variables
ENV PLAYBOOK example
ENV ENVIRONMENT ""
ENV ANSIBLE_CONFIG /etc/ansible/ansible.cfg
# disable SSH host checks
ENV ANSIBLE_HOST_KEY_CHECKING False

# Install Ansible
RUN pip install ansible

# configure SSH
ADD .ssh/config /root/.ssh/config
ADD .ssh/keys /root/.ssh/keys
RUN cat /dev/zero | ssh-keygen -q -N "" \
    && cat ~/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
# override SSH config options to avoid:
# The authenticity of host '52.27.192.213 (52.27.192.213)' can't be established.
# ECDSA key fingerprint is SHA256:EQ+GsHf0oiHFvstSRVxuWIf+h5QRHEQRCoZBdbJba8I.
ADD ssh_config /etc/ssh/ssh_config

# configure Ansible
RUN mkdir /etc/ansible
ADD .vault_pass.txt /etc/ansible/.vault_pass.txt
ADD secrets.yml /etc/ansible/secrets.yml
ADD ansible.cfg /etc/ansible/ansible.cfg
ADD hosts /etc/ansible/hosts
ADD defaults /etc/ansible/defaults
ADD group_vars /etc/ansible/group_vars
ADD roles /etc/ansible/roles
ADD playbooks /etc/ansible/playbooks

# Install AWS CLI and boto
RUN pip install awscli
RUN pip install boto

WORKDIR /etc/ansible

# # start SSH and run playbook
ENTRYPOINT service ssh restart \
           && ansible-playbook "/etc/ansible/playbooks/$PLAYBOOK/playbook.yml" \
           --extra-vars "env=$ENVIRONMENT" \
           --vault-password-file /etc/ansible/.vault_pass.txt \
           -vvvv \
           && bash

This will suck up the following directories:

  • .ssh
  • secrets.yml (uses ansible-vault which seems to be a simple way of storing secrets under version control)
  • .vault_pass.txt (open encrypted secrets.yml)

All of thee other directories are a part of Ansible’s recommanded structure:

group_vars/
   group1                 # here we assign variables to particular groups
   group2                 

host_vars/
   hostname1              # if systems need specific variables, put them here
   hostname2

playbooks/                # this hierarchy represents a playbook that utalizes several "roles"
    playbook.yml         

roles/
    common/               # this hierarchy represents a "role"
        tasks/            #
            main.yml      #  <-- tasks file can include smaller files if warranted
        templates/        #  <-- files for use with the template resource
            ntp.conf.j2   #  <------- templates end in .j2
        files/            #
            bar.txt       #  <-- files for use with the copy resource
            foo.sh        #  <-- script files for use with the script resource
        vars/             #
            main.yml      #  <-- variables associated with this role
        defaults/         #
            main.yml      #  <-- default lower priority variables for this role

The above filetree is my own take on having multiple playbooks use a collection of roles. The only issue I’ve found with this setup is that the files are onjested into the image upon build. This means every time I want to update a playbook, role, or variable, I have to rebuild the container. A simple fix to this problem would be to sue Volumes instead but that will be an enhancement for another day.

Commands

For my Docker-based workflows I create shell scripts to handle the tedious task of creating an image, running it, and destroying it between runs.

build.sh

#!/bin/bash
DOCKER_IMAGE_TAG="ansible"
echo "building image: $DOCKER_IMAGE_TAG"
docker build -t $DOCKER_IMAGE_TAG $PWD%

run.sh

#!/bin/bash
PLAYBOOK="${1:?ansible playbook muust be specified}"
echo "deploying ansible playbook: $PLAYBOOK"
ENVIRONMENT="${2:?ENVIRONMENT must be specified}"
echo "configuring environment: $ENVIRONMENT"
DOCKER_OPTIONS=$3
echo "docker options: $DOCKER_OPTIONS"
DOCKER_IMAGE_TAG="logicgate/ansible"
DOCKER_IMAGE_NAME="$PLAYBOOK-$ENVIRONMENT"
echo "clearing playbook $DOCKER_IMAGE_NAME"
echo "tag: $DOCKER_IMAGE_TAG"
echo "name: $DOCKER_IMAGE_NAME"
docker rm -f $DOCKER_IMAGE_NAME
docker run -itd \
   -e PLAYBOOK=$PLAYBOOK \
   -e ENVIRONMENT=$ENVIRONMENT \
   -e SKIP_TAGS=$SKIP_TAGS \
   $DOCKER_OPTIONS \
   --name $DOCKER_IMAGE_NAME \
   --hostname $DOCKER_IMAGE_NAME \
   $DOCKER_IMAGE_TAG

ex. ./run.sh {{ playbook name }} localhost "-p 8080:8080 -v $PWD/roles/some-role/data:/mnt/data" will create a Docker instance with Ansible and then run the playbook against itself.

Alternatively I added playbooks for creating an EC2 instance and applying it all in one go but I don’t really like the meta-hsot creation. If anything goes wrong during the deployment I have to mainually go in and destroy the EC2 instance and start anew.

Next Steps

In the next phase I’m looking into tools such as Terraform to decouple infrastructure provisioning from instance provisioning. While I’m currently using Ansible to provision EC2 instances, update Route53, etc - I’d like to have Ansible just provision an instance.