-
Notifications
You must be signed in to change notification settings - Fork 988
Proposal: State Encryption #297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@StephanHCB Hey! We'd appreciate it a lot if you could resubmit this as an RFC on the OpenTF project, once the repo is open, as it seems you've thought this through a lot. I did a rudimentary working PoC of this during the last weekend as well with pbkdf2 and aead, to better understand the problem domain. Anyway, one idea we've had and discussed, was possibly using SOPS under the hood, as it already solves the use case of encrypting json/yaml files really well (and what is a state file if not that). The advantages would be
We could encrypt plan files, too, for safer storage if you're only applying at a later time. I'd love to hear your thoughts on that. |
Will do. I am happy this issue with terraform's security will finally get addressed.
There are indeed a number of lifecycle events to consider:
Then there are the manual operations that need access to the state, where
Generally, my solution supports pluggable encryption backends via a simple interface they have to implement, one implementation could very easily be one using SOPS. You select which encryption backend you want via a pair of environment variables, one for the current encryption configuration and one fallback used only for decryption (for key rotation, switching encryption backends, or altogether moving away from state encryption). SOPS was brought up on the original issue. You've summarized the upsides quite well, the biggest downside was in my reply back then (hashicorp/terraform#9556 (comment)): "It's certainly a lot more complex than what this PR does, at least if you want to do it in a secure fashion. TF providers keep adding fields in the state, and if you miss one that's sensitive, you are again leaking credentials. This would have a high probability of giving a false sense of security IMHO, which is often actually very bad for security." Where I work, this was one of the reasons we decided to encrypt the entire state.
Good point. That is especially relevant for shipping / storing / processing plan files "in the cloud". On premises, there is less point to it, because you have to have access to both the encryption and the decryption keys anyway for every tf run. |
This is only somewhat related but with plan files it's also worthwhile to look at solutions that prevent tampering as gaining the ability to modify a plan file is quite problematic as something executing OpenTF will generally have highly privileged access. I'm not a cryptographer, but using AEAD should satisfy both the requirement to prevent sensitive values and ensuring the integrity of plans before applying them? |
Alright, now that we have the repo in place I think we can continue the discussion on this. @StephanHCB would you be able to rewrite the issue in a way that doesn't link to the Terraform repo? We're trying to keep any issues here self-contained and generally detach from the Terraform issue tracker. Then I think we can discuss this a bit more here and possibly get an RFC going? |
Hey @StephanHCB thanks for submitting! To help us maintain a clear separation between opentf and hashicorp's offerings, we're asking that people describe issues that are in other repositories rather than linking those directly. We'd love to get an RFC going of course. It is also worth mentioning that as an author of a PR to legacy your are permitted under legacy Terraform CLA to resubmit your code elsewhere as well. Thanks! |
For additional reference, Pulumi uses https://gocloud.dev/howto/secrets/ under the hood to provide this functionality. That allows to easily cover all major clouds and even keep a CMK in a different cloud to the state file. |
Alright, let's get started. Let's talk motivation and requirements. The Case for State Encryption OpenTF state contains lots of sensitive information. The most obvious example are credentials such as primary access keys to storage, but even ignoring any credentials state often includes a full map of your network, including every VM, kubernetes cluster, database, etc. That is a treasure trove for an attacker who wishes to orient themselves in your private network. Unlike runtime information processed by OpenTF, which only lives in memory and is discarded when the tf run ends, state must by its very nature be persisted. In large installations, tf state is not (just) stored in local files because multiple users need access to it. Remote state backend options include simple storage (such as storage accounts, various databases, ...), meaning these storage options do not "understand" the state, but there are also extended backends, which do wish to gain information from state. The persistent nature and (often) cloud storage of state increases the risk of it falling into the wrong hands. Large corporations and financial institutions have compliance requirements for storage of sensitive information. One frequent requirement is encryption at rest using a customer managed key. From a security perspective, ideally such a key is not even made available to a cloud provider storing the state, for if the key is available and encryption is transparent, encryption at rest reduces to little more than a proxy for RBAC. Requirements for State Encryption Ability to encrypt
using a key supplied to terraform at runtime, either directly or through a key retrieval mechanism. Off by default: Default to no encryption, thus maintaining full compatibility with existing tf versions. The following lifecycle events need to be supported
The following manual operations need to be supported
Encryption methods should be pluggable to be able to support
Partial state encryption should retain the structure of the state json and just mask sensitive values, so that the extended backends can still work with the state but not see any credentials, while full state encryption helps prevent information disclosure from state. Encryption should be transparent Encrypted state needs to be json, because many of the current backend implementations expect it to be. How much the json resembles the structure of the unencrypted state will depend on the encryption method. Note: asymmetric encryption algorithms are of little use given the way tf needs to both read and write state in the same run for many operations. You will most likely always need a symmetric session key, but where that key comes from is a different matter. It is entirely feasible to have a state encryption method that obtains its key from some kind of key vault. Any comments are welcome. |
SOPS seems to fulfill a significant portion of the above requirements, so I think we should take a closer look at it. I haven't worked with SOPS before, but just looking at their github repo I gather:
Do we already have an idea how we're going to decide which field values need encrypting? That seems to me to be the biggest hurdle we'd have to overcome, to not impose on the community a need to maintain a huge list of fields to encrypt. |
This is also a very promising candidate, at least for the full state encryption use case.
Personally this will be the first library I'll look at for a prototype implementation. |
How should state encryption be configured? In my previous implementation, I introduced two environment variables:
You set each of these to a json document with two fields
I used environment variables because those do not show up in process lists, and also can be centrally provided for all users on a machine. Also, works with terraform wrappers such as terragrunt with no code changes, because environment variables are "just present". Turns out this is also a good choice for CI systems, works with Spacelift for example, you can build it so the key never leaves your local private worker deployment. I will create a draft PR with my old code for everyone to look at, hopefully I can get around to that on Monday, although with the additional requirements I doubt the end result will actually include much of that code other than the interface for pluggable state encryption methods. |
Yeah, I think SOPS would need to be used by shelling out to the CLI, not great. I do agree that the Go Cloud library looks very promising here, esp. since Pulumi is using them already.
The schema contains info about which fields are sensitive, it's currently also included in the plan file, so in other words the info should already be there.
That's a good question! I'm not sure, open to hearing more opinions here. Environment variables are the obvious solution, but we could also think about
Sounds good, and thanks a lot for this! Btw. no rush, I don't think we'll be able to merge this prior to the 1.6 stable release, which is now our main priority. A PoC would however be great grounds for further discussion. |
I skimmed the draft and it's looking good so far @StephanHCB, I'll do a more thorough review tomorrow. One thing I noticed is that the AES encryption I think can be simplified by using a couple of primitives from the Go stdlib (though I'm admittedly not an encryption-expert). I have done a PoC of this 2 weeks ago (apologies for not posting it earlier, I should've done that when the repo went public) and I think we don't need to do authentication manually + we can use pbkdf2 to make the key arbitrary (rather than requiring users to provide the long random key), here's a very brief diff of my PoC which includes these: main...kubas-state-encryption-experiment It's by no means a strong opinion, just something I noticed when taking a glance, I may be wrong about this. |
I like how your PoC keeps the implementation less complex by just extending Client. Will have to play around with that idea to see if this approach can be generalized to include local state files and plan files. That was part of the reason I pulled it out into its own interface, even though I never got around to implementing encryption for local state files. I also like being able to generate the encryption key from a passphrase. Perhaps we should look into separating the step that obtains the key from the step that actually performs the encryption. I do think we will need to ensure the encrypted state is encoded into json, because there are some backends that expect it to be. That was the reason I opted for hex-encoding into a very simple json structure. In practice it has also turned out helpful that state is text rather than binary when copying state around. |
I'm concerned that this proposal jumps directly to a specific solution (encryption), rather than having further discussion about the root problem, which is storing sensitive values in state, period. I don't have any knowledge about SOPS; it may be a fantastic tool. I do know, however, that as soon as encryption is brought into the mix, things become significantly more complicated. Key management with a single workstation is problematic enough. Moving that into CI/CD tooling and handling team access makes that all the more painful. Additionally, if someone has access to run plans against the state file, encrypted or not, they have access to anything stored in it. This means that they potentially have access to secrets which they have absolutely no business having, purely because they're standing up the infrastructure. Depending on what regulations a company falls under (or even partner agreements), this can be a very real problem. Certainly there are solutions that can be put in place to mitigate that concern, but they all add additional complexity, both with tooling and restrictions to workflows. We're talking about a class of tools which is supposed to enable us to do our jobs, not create additional hindrances. Anything that puts additional restrictions on how the tool can be used should be met with great skepticism. I still haven't seen a good argument as to why these values need to be stored at all. Certainly I've heard that existing API contracts with providers have this requirement, but nothing that I've seen or read says that this is a fundamental requirement of the problem domain, only that it's a difficult problem to solve. The OpenTF fork presents a unique opportunity to fix this original design mistake. It might be difficult, but I fear that adding an additional band-aid of encryption over the top of a flawed design is likely to solidify it for the foreseeable future. Eventually another tool will be available which doesn't have this problem, and it will be a real differentiator if OpenTF still does. |
What's the threat model - who are we defending against here? All of my state is valuable to external attackers, for example, as it gives a snapshot of my infrastructure without them having to breach it. Therefore it's all sensitive given that threat model. Full encryption is perfectly reasonable, but I'd suggest that it's not a complete solution because avoiding credential storage would provide further benefits and mitigate different threats in the model. |
I certainly agree that state should not contain sensitive values. I do not agree that sensitive values in state are the only problem state and planfile encryption addresses. The opentofu manifesto says (https://opentofu.org/manifesto): Your proposal is of a much larger scope, and has a much longer time scale for implementation, and frankly I do not see how it can be implemented without breaking the promised compatibility. If you ask me, your proposal should become a separate Rfc, with much larger consequences, resulting in changes to a lot of providers. Transparent, off-by-default, state encryption provides a short- to mid-term solution that can be implemented in a way such that it does not break compatibility.
That is why it's off-by-default. If you don't like the complexity, you can just not set the environment variables (or whatever configuration option it'll be in the end), and everything will work exactly as before. |
I think there are ways, the simplest being much more use of environment variables in planning and running plus a lookaside file that's OpenTofu-specific about which values for which providers should be substituted in from run's environment and not kept in the state. That's then a general censorship mechanism. But, IMO, it's not this change. |
I am now at the point where I was able to take a closer look at https://gocloud.dev/howto/secrets/. Reading the documentation, we learn that
Indeed, Pulumi does not encrypt the actual state using this library, it only uses it to derive a regular AES256 session key, obtained from various Vault products, then the actual encryption step is done with built-in AES-GCM primitives. My next step will be to add the key derivation functions provided by https://gocloud.dev/howto/secrets/. This will add support for GCP Key Mgmt, AWS Key Mgmt, Azure Key Vault, and Hashicorp Vault. I've already got pbkdf2 (based on @cube2222's work), aes256 and the stacking mechanism working. The code will need some polishing and cleanup, and test coverage is currently incomplete, but that's for the time after the PoC. |
Wanted to callout a few things regarding SOPS that could be really beneficial to OpenTofu.
|
Format of encrypted state (this comment has been moved to subsection "Format of Encrypted State" of the Technical Description section of the RFC draft, see below) |
For everyone's reference, here is the WIP of a RFC I am currently working on. I'll keep updating this comment as I find time to write more. I think I am at the point where I can write it all down now. @cube2222 should I just edit the initial post once I am done or how do we retrofit this issue to become an RFC? I'd like to not lose the excellent discussions. MOVED (this comment contained the draft version of the RFC. It was moved to #874 in its entirety and will be maintained over there) |
This is excellent @StephanHCB! I didn't yet have the chance to review it, but will do so as soon as I have some time. Ideally, please create a separate RFC that references this issue. The idea is that this issue is tracking the general "state encryption feature" enhancement request, while your RFC will be a concrete implementation proposal. Both issues will stay, so no discussion will be lost. |
I have implemented a roundtrip encryption example with SOPS and pushed it on a separate branch. In summary, you can tell that this is not currently an intended use case for SOPS.
This situation may improve somewhat in the future. See:
|
@StephanHCB Finally got some time to properly review. Great work overall! I've read the RFC and everything makes sense to me. It looks like you've covered it very extensively. I have no strong opinion regarding SOPS and just using gocloud secrets makes sense to me. Regarding the sensitive_fields array, @Yantrio knows those details very well, so should be able to provide feedback here. I was thinking whether the encryption configuration should indeed be a json object passed as an env var, rather than maybe an encryption.hcl file in the project root. However, it seems that using an encryption.hcl file would be less transparent and could result in edge cases with tools like terragrunt, which might be running tofu in different directories. One topic I think is worth discussing in the RFC is the |
I like this suggestion! Not sure yet about the precise details yet, the terraform_remote_state data source is not where one normally configures remote state storage, I'll have to try out where exactly to best place it, but there are obvious benefits:
I have moved the RFC to its own issue #874 as you suggested. |
Maybe will be useful for someone. I have terraform state pointing to local file.
I'm using https://github.com/FiloSottile/age to encrypt state file which is stored in source control too.
|
Suggest an existing issue in legacy Terraform to fix in OpenTF.
I already made a PR for full client side state encryption for all backends except the extended backends 2 years ago.
I would be more than happy to either submit it as a stale PR, or just make a PR. Whichever works better for you. I actually have a local patch based on 1.5.5 which I'd just need to commit.
Technical description / Proposal - see comments below - to be added here when at least some consensus is achieved and people have had a chance to comment.
Draft PR see in the comments below, but note that the implementation will likely differ from this PR. I have tried it extensively with state stored on Azure storage accounts, and it is known to work in this scenario. The PR includes documentation as an experimental feature and extensive unit tests.
I would love not to have to maintain an internal fork for this any more :)
The text was updated successfully, but these errors were encountered: