Needless to say an automated system, no matter how big or small, must be designed with scale in mind. We’ll talk about laying foundation for a robust and malleable setup which is a useful read for everyone.
In my experience as DevOps and SRE, I’ve enjoyed quite a lot of things this profession offered. From the satisfaction of fulfilled curiosities to the anxiety of unforeseen mishaps, it delivered one day after the other. The nervousness in the face of new challenges, happiness on receiving appreciation, thrill during troubleshooting, the pride after a successful implementation, and a lot more. But the one I found myself seeking was boredom. Yes, plain old silence where no surprises are met, everything runs exactly as it should and you enjoy listening to dropping pins. This is especially true when the system in question is your own design. There’s no greater sentiment. It is like watching a bird that you’ve freed soar.
For the same reason, in this article, we’ll talk about the considerations while designing a secret management system for large-scale infrastructure. Having a grip over our secret flow is paramount w.r.t functioning of our distributed system as well the safety of our intellectual property. Therefore, keeping it stable is a huge part of preserving that silence on a big scale. Needless to say an automated system, no matter how big or small, must be designed with scale in mind. We’ll talk about laying the foundation for a robust and malleable setup which is a useful read for everyone. This article assumes that the reader is aware of secrets and their importance in a microservices infrastructure. However, if not so, a quick google search should yield ample information.
Secrets Management System consists of a lot more than just a secret store. It consists of processes to map secrets to appropriate CI pipelines, maintain different versions, and ways to incorporate secrets during deployments to various services regardless of changes in infrastructure. We’ll talk more about it later but all these facts do not downplay the importance of a hardened, highly-available storage application at all. HashiCorp Vault, for example, is quite popular in this area. It is a distributed system, which has below desirable features:
- RESTful API
- Secure storage
A secret storage system should be, preferably, central. One place where all clients can request the secrets they need. This type of setup has its undeniable advantages:
- It is simple to manage secrets
- Chances of spill or leak are minimized
- Facilitates coordination among teams
- Keeps code-base secure
HashiCorp Vault is one good example but AWS Secret Manager is also highly recommended. It may not have as many features as the vault, however, if your infra is mostly on AWS, it is a compelling choice with seamless services integration and IAM authentication.
All official documents need to be attested. That is how we validate the authenticity of their source. Why should it be any different for our microservices? I am talking, of course, about certificates. It is recommended to use them during any or all communication even in a private network. Our focus, however, is secrets management, hence we’ll discuss certificates and their importance while adding and fetching secrets from the secret vault.
Ideally, we’ll read/write secrets using RESTful APIs. From a security point of view, these requests must go over SSL/TLS with certificates issued from a trusted CA. In this regard, there are a few things we must understand:
- Depending on the scale, a large number of clients will be requesting secrets from our secret vault
- We cannot keep issuing certificates for these clients and forget about them
- Certificates must come from a trusted CA or we’ll have major security loopholes in our system.
Due to the above reasons, there are best practices in regard to PKI’s that we must follow. These practices ensure that we not only have visibility over all issued certificates but can also backtrack and pinpoint culprits during unprecedented security incidents. Here are some recommendations:
- Integrate secrets management with policy-compliant certificate issuers (public or private CA)
- Track every certificate being issued for audit purposes
- Certificates must have a managed life-cycle. No certificate must stay valid longer than required.
If you have a private trusted CA setup, you’re golden. If not, it is definitely worth looking into.
Finally! here we are. Time to write some code. Like I mentioned before, secrets management is not just about storing and fetching. We also need to govern transmission, CRUD, availability, visibility, scalability of deployment infra, etc. We can address these requirements by writing utilities and automation to streamline them as much as possible. Let’s start by reflecting on what we want from scalable automation around our secrets:
- Should support distributing secrets to multiple environments and sub-environments in run-time. It is the best way as it omits the possibility of storing or committing secrets anywhere else except the secret vault. Secrets are pulled via APIs directly from the vault as the deployment is happening. Here’s a related interesting tool for Kubernetes.
- Provide an easy way to do CRUD operations on a secret vault as per the specific design of our system. The application we’re using as vault may provide easy ways to do this but, often, in my experience, writing a wrapper on top of it has been crucial. Particularly when the goal is to design a self-service system where the dependency on DevOps is minimal.
- Authenticate users via Central Identity Management. It may be Active Directory or IAM but automation must have checks in place to regulate access. This way we can have tight control over the system.
- Expose metrics for monitoring the automation. A tailored system would require custom metrics for insightful monitoring. The more detailed metrics, the better dashboard and alerting. Also, goes without saying, integrate it with your monitoring system.
- Write test cases for automation. Whether it is Go, Python, Shell, or HCL, test cases can be written for all of them. As the system scales up, it might overflow with features and fixes. Having a test-suite handy will make the whole thing reliable by reducing the chances of something breaking exponentially.
That’s all for now. I know what would help you more in grasping these concepts. An already implemented, large-scale, intricate secrets management system that we can take apart and study, like Rancho with the refrigerator in 3 idiots. Well, I am planning something like that for my next article, which will be in continuation to this one. Thanks for reading. Stay tuned!
Blog Pundit: Sanjeev Pandey
Opstree is an End to End DevOps solution provider