Skip to content

Reproducible builds of statusgo #1185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
divan opened this issue Sep 7, 2018 · 4 comments
Closed

Reproducible builds of statusgo #1185

divan opened this issue Sep 7, 2018 · 4 comments
Labels

Comments

@divan
Copy link
Contributor

divan commented Sep 7, 2018

Problem

We want to ensure reproducible (deterministic) builds for statusd and status-go libraries. This is an important security aspect of the modern open-source software. It is also required by F-Droid and non-reproducible builds seem to be a blocker. See explanation here: status-im/status-mobile#5587

Any used must be able to build status-go by herself and get the identical binary to the one distributed over release channels.

Implementation

Essentially a reproducible build is a way to guarantee that for the same git commit anyone can build absolutely the same – byte-by-byte – binary.

For Go code, there are few inputs that may render build non-determinisitic:

  • dependencies (if dependency management tool will bump up minor version, it'll result in a difference in binary)
  • GOPATH value
  • package absolute path
  • build flags (compile, asm, linker)
  • CGO enabled/disabled
  • Go version

Most of these values can be seen as requirements for build (i.e. Go version), but some require special flags or tricks to overcome. Note, that Go 1.12 will probably have build flag like go build -release that enables those flags. Or, perhaps, Go programs will be almost 100% reproducible by default. See ongoing issue here.

GOOS and GOARCH seems not to have an effect, so the good news is that Go binaries are cross-platform reproducible. (I.e. if all other requiremens are met, cross-compiling and native compiling will yield identical binary)

I experimented with getting the same binary just by running go build and the best result I got was using this command:

export GOROOT_FINAL="/tmp/go"     # use the same GOROOT value for binaries
go build \
	-ldflags="-s -w" \                           # 'disable symbol table' and `disable DWARF generation`
	-asmflags=-trimpath="$(pwd)" \ # remove current path from paths in binary
	-gcflags=-trimpath="$(pwd) -buildid test" # the same, but for compile tool

I wasn't able to get past buildid identifier, that is generated from various inputs (including absolute path) and written into binary. So the builds in the different dirs/GOPATHs results in the slightly different binaries. That will be hopefully fixed in Go 1.12.

So the current solution is to use Docker container for a reproducible build, where we can guarantee the same GOPATH, the same absolute path and other variables. That will also remove a need of stripping out debug information, which might be useful (for more verbose stacktraces on panic or profiling info).

If Docker approach will be sufficient for the moment, that should be fairly straightforward implementation.

@divan
Copy link
Contributor Author

divan commented Sep 15, 2018

Before implementing Docker-based solution, I decided to evaluate other options. Essentially we just have to find a way to spoof/fake current directory path, and using Docker for that seems to be a bit too heavy.

Faking current dir path

I naturally started by asking the question "How we can fake the directory path?". I.e. go build process will be referring to /some/constant/path/, while the real path will be the value of whatever dir users use to build status-go.

One obvious solution is to use chrooted jail – it comes with every POSIX system, cross-platform, straightforward to use and familiar. One downside – it needs root access. We don't want to ask people root access just to make a reproducible build.

There also some lightweight containers/cgroups options, but they all seem to be requiring root access as well, and also limited to Linux only.

LD_PRELOAD and DYLD_INSERT_LIBRARIES

It's possible to write small C library that will reimplement getcwd (get current work directory) syscall and return always the constant value. Then this library can be preloaded before running Go compiler and "spoof" all calls to getcwd.

It doesn't require root access. The downsides are following: probably will require users to compile C code once, bringing the dependency on C compiler toolkit. (which is probably installed, but anyway). Plus, it's really hacky, probably has a lot of corner cases (especially on MacOS X) and may break Go build process logic. So I decided to explore other options.

Rewriting BuildID in a binary

Next approach could be rewriting the BuildID inside the binary itself if we can guarantee, that binaries are essentially equal. Let me explain that.

Go is using the concept of build identifiers (buildid) to achieve a number of tasks – letting the compiler know if particular source needs to be rebuilt, for example. Each binary (library or program) is stamped with BuildID section. In ELF files it's a separate section called .note.go.buildid.

In a bit simplified form, buildID value consists of two hashes:

buildID = "actionID/contentID"

where

  • actionID is a unique identifier of inputs (sources, go version, etc)
  • contentID is a unique identifier of outputs (mainly the content of compiler/linker output).

More information here: https://github.com/golang/go/blob/master/src/cmd/go/internal/work/buildid.go#L24

As I mentioned in a previous comment, the binaries built in the different directories differ only by the value of this stamp buildID. If you dump the binary via objdump (otool for MacOS X), the only difference between those binaries will be in the .buildID stamp.

We can check buildID value with go tool buildid statusd command. It looks like this:

xCo1KN9nJcy5odhayegrG/D5bQ0SUZErfVz8jj7VDs/j9JWi6H2-GhHIpF-t3gM/slZ4gdHzUldKcHs4fOQG

where / separates 4 hashes:

actionID(binary)/actionID(main.a)/contentID(main.a)/contentID(binary)

Sample objdump output (use gobjdump from binutils on MacOS X):

objdump -s -j.note.go.buildid statusgo

statusgo:     file format elf64-x86-64

Contents of section .note.go.buildid:
 400f9c 04000000 53000000 04000000 476f0000  ....S.......Go..
 400fac 59735133 7a4d6959 67444d74 49493338  YsQ3zMiYgDMtII38
 400fbc 4a44616d 2f463642 394c6248 51615f68  JDam/F6B9LbHQa_h
 400fcc 6e4a7536 51444672 512f6137 79354b30  nJu6QDFrQ/a7y5K0
 400fdc 5158336a 5a4f6b74 6d64364a 55422f66  QX3jZOktmd6JUB/f
 400fec 552d6f6b 54433374 346e766f 766f3663  U-okTC3t4nvovo6c
 400ffc 445f3500                             D_5.

Now, the interesting part.

Following build instructions in a previous comment, we always get the same binaries, and the contentID values are the same. What differs is actionID – because its inputs have different absolute paths as for Go 1.11.

But we don't really care about it. The rest of binary is the same on a byte level. We can simply overwrite this part manually, saying "we don't care about inputs, as long as outputs are correct".

I made a proof of concept solution for ELF files. Here are the steps:

First, we build release version in the controlled environment (like CI), extract buildID from the binary and store it somewhere (maybe in git itself, under _assets/release/status-go.buildid)

Then, in Makefile, when user runs make statusgo-release (or whatever target we choose), it does the following:

  • builds status-go binary (go build -ldflags "-s -w" -asmflags=-trimpath="$(pwd)" -gcflags=-trimpath="$(pwd)")
  • extracts buildid (go tool buildid statusgo)
  • compares it with buildid in _assets/release/status-go.buildid (diff <(go tool buildid ./statusd) <(cat ../../_assets/release/statusgo.buildid))

And if contentID part is equal, but actionID part differs, rewrite the binary with "correct buildid" – that would be enough to make binaries exactly the same on a binary level.

objcopy --update-section .note.go.buildid=_assets_release/status-go.buildid.bin ./statusd
// now check SHA hashes

After this step, we can compare SHA1 hashes of binaries and they should be the same for the same OS/ARCH no matter where this process is executed.

Notes

This obviously looks like a hack, because it is a hack. But it exploits properties of Go build system design, which was designed with future reproducibility in mind.

One thing that I don't like is that buildID hashes are actually truncated versions of original hash value (first 96 bytes of original hash encoded in base64, in fact) – see hashToString code. This increases the probability of collisions, but that works fine for the task "detect if binary should be rebuilt", and decreases the size of buildID from 259 bytes to 67, which is more readable. But that may be not enough to take a decision whether the produced binary is what we expect. Worth more analysis I guess.

I would love to hear thoughts and questions on this. I think I can make this approach work for both MacOS X and Linux transparently (using otool).

GitHub
The Go programming language. Contribute to golang/go development by creating an account on GitHub.

@divan
Copy link
Contributor Author

divan commented Oct 7, 2018

Brad Fitzpatrick mentioned approach used in Google:

... hack ... which sets the PWD environment to /proc/self/cwd/../google3 (which works on Linux, but isn't a portable solution)

@ghost
Copy link

ghost commented Jan 5, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@ghost ghost added the stale label Jan 5, 2019
@ghost
Copy link

ghost commented Jan 12, 2019

This issue has been automatically closed. Please re-open if this issue is important to you.

@ghost ghost closed this as completed Jan 12, 2019
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant