Tutorials

DC/OS Agent AMI using Packer and Ansible

Jun 12, 2018

Weston Bassler

D2iQ

10 min read

We have chosen to sunset DC/OS, with an end-of-life date of October 31, 2021. With D2iQ Kubernetes Platform (DKP), our customers get the same benefits provided by DC/OS and more.

Learn more about D2iQ Kubernetes Platform here.
I love using Ansible. I also love using Packer to build my machine images across different cloud providers. One JSON to conquer them all! I have recently decided to move the majority of my Packer Provisioners away from shell scripts to Ansible Playbooks. I am not sure how long Packer has provided this ability but it wasn't until a couple of months ago I discovered the Ansible Provisioner via a blog post I was reading. Moving away from shell scripts to Ansible provides much benefit and fits in with much more of the Infrastructure-as-Code "things" that I am currently working on. I am already using Ansible extensively elsewhere. Ansible is much simpler to write, in my opinion, and decouples much of the complexities of shell scripts. Not to mention it's great for secrets management and provisioning across different variations of OS and/or clouds.
 
In this post I will show and describe how I am using Packer and Ansible to build AMIs for my DC/OS Agents. My hopes are that you are able to not only see the value of what using Ansible alongside Packer brings to the table, but it will also give you enough information to get started using this combo to ease building and managing your machine images.
 
In my current role I am fortunate enough to work on some of the most bleeding edge technologies. I am currently tasked with leading the charge around using microservices and microservice orchestration. Our complex and diverse workload was a great fit for Apache Mesos which has since moved into DC/OS (More posts on this to come). Although using a microservice orhestration tool and distributed system such as DC/OS removes a ton (I mean a ton) of complexities from running microservices, it is still very complex to manage under the hood. It is especially complex to manage when you are in charge of the entire infrastructure and operations stack (OS, Framework, Monitoring, Logging, Patching, Provisioning, etc…) with such little man power. Automating your infrastructure and operations is not only good practice, but is an absolute must to survive.
 
One of the most fundamental pieces to ease the pain of these infrastructure complexities is building machine images. A machine image is essentially a template of what you want your machine to look like when you initially launch it and it starts running. The idea here is to "pre-bake" as much into your machine image(s) as possible so that when you launch a new instance from the image you get a machine that is "ready to serve". One way to figure out how to build your machine image is to look at a machine that is currently running perfectly healthy in production and mentally take a snapshot of that state. What is currently running there and how did you get to that state? Can this state be achieved through automation and code?
 
Something else to point out quickly: machine images should also be built to be immutable. Yes, a buzz word that we have all heard and hate at this point, but it will really make your life easier and make your systems more dynamic. Machine images allow for us to think in terms of replacement vs changing or modifying current. However, this post is not about building infrastructure best practices and benefits (perhaps another post) so I'm not going to go into further detail.
 
The Code
Before take off, let's take a look at the github repo and see what we are trying to accomplish:
 
 
 
In short, we are going to create an AMI using Packer, with Ansible as the main work horse, for DC/OS Agents with the following components:
 
 
The first 2 roles (common and docker) are necessary to run our services on DC/OS and the rest of the roles (beats, telegraf, netdata, check_mk) are how we monitor and log our infrastructure and services running on DC/OS. The finished AMI on launch will meet ALL necessary DC/OS pre-reqs, begin shipping metrics and logs to Elastic, begin shipping metrics to Influx, have Netdata UI running, and have pre-reqs to be added to check_mk server. We will have "pre-baked" all of the above into a machine images and new instances will be "ready to serve" as DC/OS Agents on machine launch with no manual intervention.
 
Packer Provisioners
Although Packer supports creating machine images for multiple cloud providers, this post and the above repo focus on creating an Amazon AMI. In order to work with Amazon you will need your AWS access and security keys in order to interact with Amazon API (See Amazon Builders). You will also need to know your VPC ID, Subnet ID, Region, AZ and source AMI (I use the latest Centos 7 AMI from the marketplace for my AMI source). You can either export this info as environment variables or hard code it directly into the Packer JSON which for the purposes of this post is the dcos_agent_centos7.4.json file.
 
Let's take a look at our provisioners for Packer (line 36–59 in dcos_agent_centos7.4.json). Packer provisioners are where we add custom code and configs to help prepare and create our machine images.
 
"provisioners": [
{
"type": "shell",
"remote_path": "/home/centos/agent-setup.sh",
"script": "agent-setup.sh"
},
{
"type": "ansible-local",
"playbook_file": "ansible/aws-packer.yml",
"playbook_dir": "ansible",
"staging_directory": "/home/centos/ansible",
"inventory_file": "ansible/inventory/aws/hosts"
},
{
"type": "shell",
"execute_command": "{{ .Vars }} sudo -E /bin/sh -ex '{{ .Path }}'",
"inline": [
"sleep 5s",
"sudo yum remove ansible -y",
"sudo rm -rf /home/centos/ansible",
"echo Complete..."
]
}
]
 
From the above code you can see that we are using three provisioners. Packer will execute these in order specified in the JSON. We are using two "shell" provisioners and one "ansible-local" provisioner. One thing important to note quickly, for AWS AMIs, Packer creates a temporary SG and pem file for ssh access. This is what makes provisioning connections possible.
 
Since we are using the "ansible-local" provisioner, we are going to be executing Ansible locally on the ec2 instance that will in turn be used as our AMI. And since Ansible is not a package installed by default in most operating systems, we actually need to somehow install it so we can provision with it. We also most likely do not want to leave Ansible installed in the machine image so we will need to clean it up before Packer converts the instance into an AMI. We complete the installation of Ansible (among a couple other things) and the uninstall with the first and the third shell provisioners.
 
Let's now dive deeper into the Ansible provisioner to get a better understanding of how Packer uses it:
 
{
"type": "ansible-local",
"playbook_file": "ansible/aws-packer.yml",
"playbook_dir": "ansible",
"staging_directory": "/home/centos/ansible",
"inventory_file": "ansible/inventory/aws/hosts"
},
 
After we actually have Ansible installed, thanks to the first shell provisioner, Packer will begin to initiate and stage our roles to the defined "staging_directory" on the ec2 instance. Packer actually copies over our roles locally to the instance. What Packer is doing here is allowing us to define a set of key values so that we can actually run roles and playbooks on the instance for provisioning. The "playbook_file" defines the Ansible playbook file that defines information such as roles and hosts to install the roles. The other important key here is the "inventory_file" where you define the hosts for your roles. Perhaps it is more helpful to read more on Ansible Playbooks if this is a bit fuzzy?
 
Essentially what we define in the "ansible-local" provisioner as key values, Packer executes as an Ansible Playbook below on the ec2 instance:
 
scp ${playbook_dir} sudo-user@ec2:${staging_dir}
# On ec2 instance:
ansible-playbook -b -i ${inventory_file} ${playbook_file}
 
There is a bit more to it than that, but for simplicity sake, the above command is the most basic form in theory. Keep in mind there are a ton more key values that you can define than what is being shown above for the Ansible Provisioner. Read up on the docs here.
 
Creating the AMI
Now that we have a better understanding about how to use Ansible with Packer and a better understanding of how Packer is using the Ansible Provisioner, we can go ahead and now create the AMI.
 
I usually always check the version of Packer prior to first run to make sure that I am using the latest and greatest version:
 
packer version
Packer v1.2.3
 
If you are not, Packer cli will let you know.
 
Once we have the latest version, as best practice, I always validate my JSON file to ensure that I have no syntax errors. You will either need to export your vars on the cmd or hard code the vars in the JSON.
 
packer validate \
>     -var ami=ami-centos7 \
>     -var vpc_id=vpc-12345678 \
>     -var subnet_id=subnet-12345678 \
>     -var region=us-east-1 \
>     -var az=us-east-1a \
>     dcos_agent_centos7.4.json
Template validated successfully.
 
On an error, you would see output of:
 
Template validation failed. Errors are shown below.
Errors validating build 'amazon-ebs'. 1 error(s) occurred:
Once validation succeeds, it is time to run the build.
packer build \
>     -var ami=ami-centos7 \
>     -var vpc_id=vpc-12345678 \
>     -var subnet_id=subnet-12345678 \
>     -var region=us-east-1 \
>     -var az=us-east-1a \
>     dcos_agent_centos7.4.json
 
The output of the build is far too long to output here, but once Packer is able to connect to the ec2 instance it will begin running through your provisioners. If at any point the build fails for any reason, the instance and the temporary SG and Keys are automatically cleaned up. You will then need to troubleshoot. The main reason for using shell provisioners here vs user-data is because you can actually see the output as it runs since the connection is over ssh. This makes troubleshooting easier if necessary.
 
Once the Ansible Provisioner runs you should start seeing output such as:
 
PLAY [localhost] **************************************************************
GATHERING FACTS ***************************************************************
ok: [localhost]
TASK: [update all packages] **********************************************************
 
If all completes as expected, Packer will create a new AMI of the ec2 instance and output the new AMI ID for you. Boom! You have just provisioned your AMI with Ansible and Packer!
 
Next Steps
A couple of things to consider as far as next steps: if you are a Jenkins user and are lazy like me, how about a Jenkins Pipeline? Nothing complex, but something that will execute everything for you automatically. See the Jenkinsfile in the repo as well as the README for a super simple Pipeline for validating and running these builds.
 
#!groovy
node {
def err = null
def environment = "Development"
currentBuild.result = "SUCCESS"
try {
stage ('Checkout') {
checkout scm
}
/* Show Version so we know if we need to update or not */
stage ('Version') {
sh """
set +x
packer version
"""
}
/* If Validation fails, Pipeline fails. See Console for errors */
stage ('Validate') {
sh """
set +x
packer validate dcos_agent_centos7.4.json
"""
}
/* Build if validation passes */
stage ('Build') {
withCredentials([string(credentialsId: 'aws-access-key', variable: 'AWS_ACCESS_KEY_ID'),
string(credentialsId: 'aws-secret-key', variable: 'AWS_SECRET_ACCESS_KEY')]) {
sh """
set +x
packer build -var aws_access_key=${AWS_ACCESS_KEY_ID} -var aws_secret_key=${AWS_SECRET_ACCESS_KEY} dcos_agent_centos7.4.json
"""
}
}
stage ('Notify') {
mail from: "email@email.com",
to: "email@email.com",
subject: "Packer Build for ${environment} Complete.",
body: "Jenkins Job ${env.JOB_NAME} - build  ${env.BUILD_NUMBER} for ${environment}. Please investigate."
}
}
catch (caughtError) {
err = caughtError
currentBuild.result = "FAILURE"
}
finally {
/* Must re-throw exception to propagate error */
if (err) {
throw err
}
}
}
 
Incorporate it with other Infra-as-Code and other automations for fully automatic builds and infrastructure management. Remember earlier when I mentioned immutable infrastructure? This handles a major piece of getting to fully automated and replaceable infrastructure. I currently incorporate these Packer builds within my Terraform code to manage and provision all aspects of my infrastructure on AWS for DC/OS.
 
Hopefully after reading this your are able to see just how easy it is to start using your Ansible Playbooks to build your machine images with Packer. Both of these tools have played an integral part in helping simplify my IaC. I am hopeful that I have also simplified someone's life that is in charge of managing a DC/OS Cluster as well.

Ready to get started?