Managing large clusters of machines is non-trivial. Even in the cloud where the cloud provider takes care of a lot of the complicated stuff, such as networking, isolation and redundancy, you still need to get your system up and running. Then, you need to update frequently and elastically add and remove machines and ensure that nothing breaks and your users have a pleasant and streamlined experience. In this article, I'll focus on the initial preparation of your machines. You may be all dockerized and use your virtual machines as dumb hosts to your fancy containers or use an orchestration layer like Kubernetes or Docker Swarm. But you'll soon realize that even in such a setup you will need to perform at least some basic operations on all your nodes. For example, you need to install Docker itself (I wouldn't rely on pre-packaged VM images with some obscure version of Docker), you may need to allocate some extra resources (e.g. additional disks), you may need to create some directory structures that you mount as volumes into your container and you may need to install some monitoring agents.
Invoking Cloud API
At scale, the nice Web console is not a suitable solution for managing your fleet of nodes. You need proper automation and often need to dynamically invoke cloud API to scale your cluster, provision, configure and upgrade machines.
Once your hardware (virtual or not) is set up, you need to take care of the software. You need to update system packages, install dependencies, tweak settings and finally install your application. Ansible is a fantastic tool that lets you operate on multiple machines in a very convenient manner. It has a lot of modules that can perform a lot of tasks in an independent way and it can scale for fairly complex systems -- using concepts such as playbooks, roles and inventory files. It is truly a well-designed product that makes it simple to start and gets only as complicated as needed when you scale. My favorite feature is that the fact that Ansible doesn't require any agent to run on the managed nodes. All you need is SSH access.
Security vs. Productivity
Security is a necessary evil. Depending how critical your system is the burden can be overwhelming. But, often you work a lot in development and staging environments, where quick turnaround is key. I'll show you a couple of tricks using Ansible for creating a friction-free DevOps nirvana
The first thing you want to do is to use a key for SSH authentication and not username and password. This is actually a good practice from security point of view as well. To do that you need to add your public key to the authorized_keys file on each remote node. I assume that you start with SSH access via the same username/password to all the nodes (standard practice in cloud environments).
You'll need an Ansible inventory file that lists all your nodes. For example:
[cluster] node-1 node-2 node-3
The following ansible command will do just that.
ansible <host group> -i <inventory file> -u <user name> --ask-pass -m authorized_key
From now on you can SSH and by extension run ansible commands on playbooks on all the nodes without providing a password. Your private key must be available to Ansible in one way or another (ansible.cfg, --private-key, extra SSH options).
OK, you can run commands on all those remote nodes, but every time you want to run a command as sudo you have a problem. Ansible lets you provide a sudo password using the -K or --ask-become-pass, but still you have to enter it every time. If you use a lot of ad-hoc Ansible commands that gets annoying really quickly. To avoid the sudo password you can add a special incantation to the /etc/suders file on each target machine. For example, if the user name is "joe" then adding this line will do the trick
%joe ALL = (ALL) NOPASSWD: ALL
But, there is a better way. Instead of messing with /etc/sudoers itself, which is a sensitive file (you're supposed to edit it only with visudo) you can add a new file to /etc/sudoers.d. This directory is included at the end of the /etc/sudoers file:
# See sudoers(5) for more information on "#include" directives: #includedir /etc/sudoers.d
So, adding a file to this directory achieves the same result in a much safer way and easier to see custom additions. The following playbook does exactly that and also sets the permissions to read-only for the user, which is required.
- hosts: all gather_facts: true become: true tasks: - name: Copy file that allows no password to /etc/sudoers.d copy: content: "%joe ALL = (ALL) NOPASSWD: ALL" dest: /etc/sudoers.d/passwordless_sudo owner: root group: root mode: 0400
You can of course use a variable or environment variable for the user name.
Bootstrapping your cloud environment for a streamlined operation is crucial for productivity. Make sure you balance security with productivity according to your organization's policies and take advantage of passwordless Ssh and passwordless sudo where appropriate.