-
Notifications
You must be signed in to change notification settings - Fork 261
Description
Problem
We want to ensure reproducible (deterministic) builds for statusd and status-go libraries. This is an important security aspect of the modern open-source software. It is also required by F-Droid and non-reproducible builds seem to be a blocker. See explanation here: status-im/status-mobile#5587
Any used must be able to build status-go by herself and get the identical binary to the one distributed over release channels.
Implementation
Essentially a reproducible build is a way to guarantee that for the same git commit anyone can build absolutely the same – byte-by-byte – binary.
For Go code, there are few inputs that may render build non-determinisitic:
- dependencies (if dependency management tool will bump up minor version, it'll result in a difference in binary)
- GOPATH value
- package absolute path
- build flags (compile, asm, linker)
- CGO enabled/disabled
- Go version
Most of these values can be seen as requirements for build (i.e. Go version), but some require special flags or tricks to overcome. Note, that Go 1.12 will probably have build flag like go build -release
that enables those flags. Or, perhaps, Go programs will be almost 100% reproducible by default. See ongoing issue here.
GOOS and GOARCH seems not to have an effect, so the good news is that Go binaries are cross-platform reproducible. (I.e. if all other requiremens are met, cross-compiling and native compiling will yield identical binary)
I experimented with getting the same binary just by running go build
and the best result I got was using this command:
export GOROOT_FINAL="/tmp/go" # use the same GOROOT value for binaries
go build \
-ldflags="-s -w" \ # 'disable symbol table' and `disable DWARF generation`
-asmflags=-trimpath="$(pwd)" \ # remove current path from paths in binary
-gcflags=-trimpath="$(pwd) -buildid test" # the same, but for compile tool
I wasn't able to get past buildid
identifier, that is generated from various inputs (including absolute path) and written into binary. So the builds in the different dirs/GOPATHs results in the slightly different binaries. That will be hopefully fixed in Go 1.12.
So the current solution is to use Docker container for a reproducible build, where we can guarantee the same GOPATH, the same absolute path and other variables. That will also remove a need of stripping out debug information, which might be useful (for more verbose stacktraces on panic or profiling info).
If Docker approach will be sufficient for the moment, that should be fairly straightforward implementation.
Activity
divan commentedon Sep 15, 2018
Before implementing Docker-based solution, I decided to evaluate other options. Essentially we just have to find a way to spoof/fake current directory path, and using Docker for that seems to be a bit too heavy.
Faking current dir path
I naturally started by asking the question "How we can fake the directory path?". I.e. go build process will be referring to
/some/constant/path/
, while the real path will be the value of whatever dir users use to build status-go.One obvious solution is to use chrooted jail – it comes with every POSIX system, cross-platform, straightforward to use and familiar. One downside – it needs
root
access. We don't want to ask people root access just to make a reproducible build.There also some lightweight containers/cgroups options, but they all seem to be requiring root access as well, and also limited to Linux only.
LD_PRELOAD and DYLD_INSERT_LIBRARIES
It's possible to write small C library that will reimplement
getcwd
(get current work directory) syscall and return always the constant value. Then this library can be preloaded before running Go compiler and "spoof" all calls togetcwd
.It doesn't require root access. The downsides are following: probably will require users to compile C code once, bringing the dependency on C compiler toolkit. (which is probably installed, but anyway). Plus, it's really hacky, probably has a lot of corner cases (especially on MacOS X) and may break Go build process logic. So I decided to explore other options.
Rewriting BuildID in a binary
Next approach could be rewriting the BuildID inside the binary itself if we can guarantee, that binaries are essentially equal. Let me explain that.
Go is using the concept of build identifiers (
buildid
) to achieve a number of tasks – letting the compiler know if particular source needs to be rebuilt, for example. Each binary (library or program) is stamped with BuildID section. In ELF files it's a separate section called.note.go.buildid
.In a bit simplified form, buildID value consists of two hashes:
where
actionID
is a unique identifier of inputs (sources, go version, etc)contentID
is a unique identifier of outputs (mainly the content of compiler/linker output).More information here: https://github.com/golang/go/blob/master/src/cmd/go/internal/work/buildid.go#L24
As I mentioned in a previous comment, the binaries built in the different directories differ only by the value of this stamp
buildID
. If you dump the binary viaobjdump
(otool
for MacOS X), the only difference between those binaries will be in the .buildID stamp.We can check buildID value with
go tool buildid statusd
command. It looks like this:where
/
separates 4 hashes:Sample
objdump
output (usegobjdump
frombinutils
on MacOS X):Now, the interesting part.
Following build instructions in a previous comment, we always get the same binaries, and the
contentID
values are the same. What differs isactionID
– because its inputs have different absolute paths as for Go 1.11.But we don't really care about it. The rest of binary is the same on a byte level. We can simply overwrite this part manually, saying "we don't care about inputs, as long as outputs are correct".
I made a proof of concept solution for ELF files. Here are the steps:
First, we build release version in the controlled environment (like CI), extract buildID from the binary and store it somewhere (maybe in git itself, under
_assets/release/status-go.buildid
)Then, in Makefile, when user runs
make statusgo-release
(or whatever target we choose), it does the following:go build -ldflags "-s -w" -asmflags=-trimpath="$(pwd)" -gcflags=-trimpath="$(pwd)"
)go tool buildid statusgo
)_assets/release/status-go.buildid
(diff <(go tool buildid ./statusd) <(cat ../../_assets/release/statusgo.buildid)
)And if
contentID
part is equal, butactionID
part differs, rewrite the binary with "correct buildid" – that would be enough to make binaries exactly the same on a binary level.After this step, we can compare SHA1 hashes of binaries and they should be the same for the same OS/ARCH no matter where this process is executed.
Notes
This obviously looks like a hack, because it is a hack. But it exploits properties of Go build system design, which was designed with future reproducibility in mind.
One thing that I don't like is that buildID hashes are actually truncated versions of original hash value (first 96 bytes of original hash encoded in base64, in fact) – see hashToString code. This increases the probability of collisions, but that works fine for the task "detect if binary should be rebuilt", and decreases the size of buildID from 259 bytes to 67, which is more readable. But that may be not enough to take a decision whether the produced binary is what we expect. Worth more analysis I guess.
I would love to hear thoughts and questions on this. I think I can make this approach work for both MacOS X and Linux transparently (using otool).
divan commentedon Oct 7, 2018
Brad Fitzpatrick mentioned approach used in Google:
ghost commentedon Jan 5, 2019
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
ghost commentedon Jan 12, 2019
This issue has been automatically closed. Please re-open if this issue is important to you.