The concept of Infrastructure-as-code (IaC) means that you are able to deploy, configure and regenerate your (application) servers and other infrastructure components in an automated way. The number of IaC-tools that exist to help accomplish this, grows continuously. For every organization, team or engineer the solution they use may be a different one depending on the infrastructure environment and previous experiences they have.
While working on IaC as an engineer and consultant, I came across several pitfalls and good-practices that I would like to share. They are presented as decisions that you and your team can discuss and agree on. This improves the maturity level of your way-of-working.
1. Decide where to store your code
Do you have your code/scripts only on your own workstation? Or are you running scheduled tasks or a cronjob with the actual script existing only on the server itself? And how about the configuration files of your applications? Having a central place where you store (the latest version of) these files will benefit your team and organization.
When using IaC it is trivial to use a versioning system for storing your files. You could theoretically use Sharepoint for this, however the current de-facto versioning system for IaC is Git. Git is aimed at developers storing code and can function as the main source from where your code is deployed. There are various solutions built around Git (Github, (Gitlab)[https://about.gitlab.com/], Bitbucket to name a few). Most of them offer a free SaaS if your repository is public, but there also exist free community editions you can host on your own infrastructure. In my opinion, being able to use Git needs to be part of the basic skill set for any Developer or DevOps engineer.
2. Decide how to store your code
Now, you have some choices to make depending on the complexity of your organization and IT environment. You can choose to use either a mono-repo for all your IaC code, a different repository for each used tool/language respectively or a repo per application server/infrastructure type. Also think of what branching strategy works well for you. Make sure to discuss and agree on this with your team.
You can start simple and evolve your strategy over time when it starts making sense. Or you can put some more thought into this beforehand and make a more informed decision to prevent possible rework later on. Either way, empathize with the new colleague starting in your team. How easy is it for him/her to understand the way your code is organized? If it is not that self-explanatory, how about adding a readme in the root of a repository with some instructions and other useful documentation?
3. Decide how to run your code
Having a tool/system from where all your IaC starts running may help you get more in control of what happens in your infrastructure. For this I recommend to use a CI/CD tool such as Jenkins, Gitlab-CI or Azure DevOps. You can choose to trigger jobs manually, by webhook or on a schedule and there will be an audit trail for every job that has run. Also your jobs will run from an agent. These agents can be pre-configured with the tools needed to run a certain job. This has the advantage that you have less chance of running into problems because your colleague is using a different version of a tool.
4. Decide how to handle your secrets
You probably don’t want to store the database password for the customer database in plain text in your repository. Even if the repository is only accessible from within your own network and protected with Multi-Factor Authentication. Remember, with Git you make a clone of a repository, so these credentials are now on your machine, and on the machines of your fellow engineers. A more acceptable solution for this is to use a vault system to encrypt your secrets, that supports injecting these secrets as environment variables during runtime of your pipeline. Ideally, security should be enabled on multiple layers, so even if one part is breached there is a second line of defense.
5. Decide on a common set of tools
Just like a carpenter and barber have their favourite tools, the same goes for the (Dev/Sec/Ops) engineers. There are many ways to accomplish the same thing and it is good to sometimes explore if there is an easier/faster/cheaper method to accomplish something. However if everyone uses the same toolset it is easier to share and reuse certain building blocks. Your organization should find some middle ground between giving engineers the freedom to experiment with new tools and standardizing on a common set of tools. Some tools work very well together and some don’t and paying double license fees is usually a waste.
6. Decide on the level of granularity in your pipelines
When using pipelines for running your IaC, there are again many ways to get the same end result. Use a naming convention and clear descriptions so others will also know what a pipeline is used for. Consider splitting up a pipeline in multiple stages, so you could re-run or skip a stage in your pipeline based on the type of deployment. Then decide if you want to have mandatory reviews, need approval from a manager, or let developers have freedom to deploy themselves when going live.
7. Decide on the lifecycle of your infrastructure
There is a big difference in creating a script to deploy something once for Proof-of-Concept purposes and developing robust code that can be used for the full lifecycle of your infrastructure.
Too often I found that there was no or not enough thought given to how to maintain a system. The world around us changes continuously so the system must also be able to adapt. For infrastructure this consists for example of security updates. Or being able to scale your resources if needed. Using SaaS/PaaS services can reduce the amount of maintenance you have to do on this yourself, but you pay for them more directly. Even then the services you are using will evolve and might require engineering to adapt. There are various strategies and practices you can employ to make this part easier each with their own pro’s and con’s. Find out what works best for your situation.
Best practices
Here are a couple of best practices to discuss.
Using immutable infrastructure
The idea here is that you never perform updates on a live system, preferably even remove the ability to log in to the machine completely. To update, create a complete new deployment of your stack. Then, if it looks good, destroy the old one. This ensures that the live system configuration remains untouched while running and you only deploy what the code is generating.
Aiming for idempotency
In #devops is no shame to be idempotent – DevOps Borat
The idempotency principle ensures that when running the IaC code multiple times on a system, the end result is the same. You define the end result state of your infrastructure in the language code of the tool and the tool gets you there. Configuration automation tools such as Ansible, Puppet and Chef employ this in their modules and try to bring/keep the state of a system as defined in the code. When building custom scripts or building your pipeline run them again and test for their idempotency.
Validating and testing
“To make error is human. To propagate error to all server in automatic way is #devops” – DevOps Borat
You will and should fail sometimes. Learn from this and improve. Add syntax and linting checks to your IDE as well as in your pipeline. Add security tests and stages for validating your code before actually executing it. Perform a dry-run or test-deployment and verify they are correct. All this to reduce the chance of propagating an error to the production environment.
Don’t stop there. Use a tool like InSpec to test if the state of deployed infrastructure is as described. And make sure you don’t forget to add code for your monitoring and alerting systems as well. Or for any other place where things otherwise need to be done manually!
Continuously improve
By making these decisions, you will be able to keep your infrastructure code organized, increase security and gain trust in having deployed the correct infrastructure. These decisions are naturally influenced by the culture of you organization and the amount of trust, freedom and responsibility an individual gets from his/her peers and management. Therefore, review regularly what can be done to improve further and don’t limit this to just the technical needs. The whole chain of people and processes that eventually add value to your company must be considered.
Learn how to work together, how to give and receive feedback and improve step by step!
Who am I
My name is Menno van der Bijl and I am the team lead for DevOps at Techspire. I love to help people and organisations improve their lives with technology, but also with the processes around the technology. I believe that with the right mindset with your engineers and a culture where trust, freedom and responsibility are key you can achieve great things.