With so many servers now being virtualized and many of them being pushed to the cloud, automated instance configuration and customization is a must-have. The cloud-init package is one of my favorite tools for tweaking the settings within an instance, especially if we’re talking about more than a small handful of systems which need to be similarly configured.
If you’ve ever fired up a CentOS instance on Amazon Web Services (AWS), that instance was configured with cloud-init when it was initially launched by using an Amazon EC2 datasource. Other datasources exist for other cloud providers and infrastructures, such as CloudSigma, CloudStack, Digital Ocean, OpenStack, etc.
NoCloud datasource – cloud-init on non-cloud-based systems
What many folks aren’t familiar with, even those who are familiar with cloud-init is that you can use cloud-init to change the settings on non-cloud-based systems, too! This is known as the NoCloud datasource and uses a vfat or iso9660 filesystem (usually an ISO mounted to a virtual CDROM or a floppy mounted in a virtual disk drive) to deliver the configuration data to cloud-init running on the system. Since the data can be provided by a local filesystem, no network is necessary, making configuration of network-isolated machines consistent and repeatable. The cloud-init documentation for the NoCloud datasource has detailed information and examples.
This is not to say that you can’t store the configuration files on your network if one is available. For instance, you could provide the NoCloud datasource on the kernel command line, and specify that the data is stored locally or is accessible on the network using the HTTP(S) or FTP protocols.
We can now have centralized configuration management, letting us use cloud-init modules to specify the networking, accounts, SSL, disks/partitions, packages and run custom scripts, or utilize more powerful system configuration tools such as Chef or Puppet. Obviously, this NoCloud approach probably won’t scale well to more than a few tens of systems, but if you have a small private or personal cloud, or you want to test your meta-data and user-data files, NoCloud can be very useful!
Issues with RHEL/CentOS 7.4 and NoCloud
We’ve been using cloud-init internally for over a year and have run into a couple of “gotchas” that you may, or may not, encounter if you use cloud-init.
Most recently, version 0.7.9 of cloud-init, which shipped with RHEL/CentOS 7.4, does not play well with NoCloud. If you try to set the system hostname, you will see the following errors in the cloud-init debug output:
The issue being that cloud-init attempts to talk to dbus before it is started. A work-around, provided in the tickets, may be a solution for some folks, but for me, I didn’t want to modify files provided by the package which would not be replaced when a new version of the package is made available.
Since the version which shipped with CentOS 7.3, cloud-init 0.7.5, was working just fine in my environment, I opted to install that version of the package and lock it with yum’s versionlock plugin when I build the “golden image” that my future instances will be based on:
Permission changes on /etc/ssh/sshd_config
Another issue that I’ve encountered lately is that cloud-init changes the permissions on /etc/ssh/sshd_config to 0644, instead of honoring the existing permissions. This is a known issue with the code in cc_set_passwords.py, or more-so, the write_file() function in util.py, and has been addressed in cloud-init 17.1, but officially, cloud-init 0.7.9 is the newest release available for RHEL/CentOS 7.4. By the way, “cloud-init 17.1” is not a typo; cloud-init was version bumped after 0.7.9 to 17.1.
How I encountered this issue was performing PCI-DSS hardening for images that I was creating for a customer. One PCI-DSS test, “Verify and Correct File Permissions” (xccdf_org.ssgproject.content_rule_rpm_verify_permissions), requires that the permissions on all files match the permissions that they were installed with from the RPM. Since the initial file permissions were 0600, cloud-init’s alteration to 0644 triggers a failure for that particular PCI-DSS test.
We can verify the original permissions by querying the RPM database:
We’re now faced with selecting a solution that our mandate and/or business case must support, or at least accept:
A) Update cloud-init to 17.1 or newer, which are currently versions newer than that published by the distribution and accept any changes (positive or negative) that come with that version
B) Modify the existing cloud-init 0.7.x package files to maintain or force the 0600 permissions, knowing that these changes may impact future cloud-init package updates
C) Backport the changes made to cloud-init 17.1 which provide the copy_mode option to write_file(), still knowing that these changes may impact future cloud-init package updates
D) Note an exception to the PCI-DSS hardening
As with most things, the solution that you decide on will depend on your project requirements, environment, and management.
Support for all cloud-init modules
One final topic that I’d like to cover is that RHEL/CentOS may not support all of the modules that cloud-init’s upstream maintainers provide.
When crafting your meta-data and user-data, it may be beneficial to verify that the modules that you intend to use are supported and, when not supported, determine if you’re willing to use the bootcmd, runcmd or scripts-* (vendor/per-once/per-boot/per-instance/user) modules to implement the missing functionality.
Comparing the SOURCES/cloud-init-centos.cfg file from the cloud-init 0.7.5 source RPM to the config/cloud.cfg file within the cloud-init-0.7.5.tar.gz archive (included with the source RPM), we can see the following modules are not called:
We also see that an additional module is called:
Some of the disabled modules make sense, such as emit_upstart, grub-dpkg, apt-* and landscape, which are tailored towards other distributions (such as FreeBSD, SUSE, Ubuntu, Debian), while others, like ssh-import-id are disabled when you’d otherwise think that they’d be active.
You will be forgiven for being further confused by the online documentation. If you pull up the cloud-init online documentation, you will be redirected to the latest version (17.1, currently). There are modules supported in 17.1 which simply don’t exist in the RHEL/CentOS published versions. Unless you change the version using the green “v: ” dropdown at the bottom of the left frame, you will likely be reading docs that don’t apply to you.
I stumbled upon the disabled modules and documentation dissonance when trying to get cloud-init 0.7.4 (or perhaps 0.7.5) to register an instance to our Spacewalk server. Everything looked fine with my metadata and there were no errors in the cloud-init output, but the system refused to register. It turned out that the Spacewalk module was not added until cloud-init 0.7.8, so I used one of the scripting modules to add the Spacewalk registration functionality into the earlier cloud-init version that I had to use. Lesson learned!