Skip to content

Reproducible builds #70131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dims opened this issue Oct 23, 2018 · 55 comments
Closed

Reproducible builds #70131

dims opened this issue Oct 23, 2018 · 55 comments
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/release Categorizes an issue or PR as relevant to SIG Release.

Comments

@dims
Copy link
Member

dims commented Oct 23, 2018

From @dims on September 21, 2018 21:4

Please see https://reproducible-builds.org/ specifically https://reproducible-builds.org/docs/ on ideas about how to do deterministic builds. When we get a chance, we should try to examine how far away we are from this goal and what our blockers are.

Thanks,
Dims

Copied from original issue: kubernetes/release#637

@dims
Copy link
Member Author

dims commented Oct 23, 2018

From @ixdy on September 21, 2018 21:54

@bmwiedemann has done some work on this already in #48710.

For more increased reproducibility, we should probably be setting SOURCE_DATE_EPOCH in release builds (we already do this in CI), though there are still a few other bits missing.

@dims
Copy link
Member Author

dims commented Oct 23, 2018

@dims
Copy link
Member Author

dims commented Oct 23, 2018

@bmwiedemann i can't seem to tell if the results are from the latest k8s releases. if not is there a way to trigger these for say the v1.12-rc1 please?

@dims
Copy link
Member Author

dims commented Oct 23, 2018

From @bmwiedemann on September 22, 2018 20:2

openSUSE's diff is from 1.11.1 (and we do have SOURCE_DATE_EPOCH set)

Going to 1.12 is not that easy, because my reproducibility-test tools are designed around building of packages and there are usually so many changes and build is so slow (~20 minutes per try) that it can take hours to get right.

However, IMHO it would be a good start to find out if and how the two known issues in 1.11.1 have been addressed. If there are patches, I could apply them to 1.11.1 and see if anything remains there.

I'd prefer to not have to chase the master branch.

@dims
Copy link
Member Author

dims commented Oct 23, 2018

gotcha thanks @bmwiedemann

@dims
Copy link
Member Author

dims commented Oct 23, 2018

@bmwiedemann please see #68983 to see if it fixes the man page issue

@dims
Copy link
Member Author

dims commented Oct 23, 2018

On the buildid problem, looks like we may have to wait for next versions of golang:
golang/go#16860

See for example how others are trying to think about the same problem:

@dims
Copy link
Member Author

dims commented Oct 23, 2018

We need to update k/release anago scripts to set SOURCE_DATE_EPOCH and save the information somewhere (in generated tarballs? release notes?)

@dims
Copy link
Member Author

dims commented Oct 23, 2018

From @bmwiedemann on September 26, 2018 4:30

regarding random build-ids:
https://blog.filippo.io/reproducing-go-binaries-byte-by-byte/ seems to imply that it is already possible to generate reproducible go binaries and indeed our openSUSE "docker" package is already reproducible (we always build in the same path)

strace showed me

execve("/usr/lib64/go/1.10/pkg/tool/linux_amd64/compile", ["/usr/lib64/go/1.10/pkg/tool/linux_amd64/compile", "-o", "/tmp/go-build336594203/b073/pkg.a", "-trimpath", "/tmp/go-build336594203/b073", "-p", "k8s.io/kubernetes/vendor/k8s.io/gengo/examples/set-gen/sets", "-complete", "-buildid", "SUvgWqVQmIZMGMbPSYtX/SUvgWqVQmIZMGMbPSYtX", "-goversion", "go1.10.3", "-D", "", "-importcfg", "/tmp/go-build336594203/b073/importcfg", "-pack", "-c=4", "/home/abuild/rpmbuild/BUILD/kubernetes-1.11.1/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/gengo/examples/set-gen/sets/byte.go", "/home/abuild/rpmbuild/BUILD/kubernetes-1.11.1/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/gengo/examples/set-gen/sets/doc.go", "/home/abuild/rpmbuild/BUILD/kubernetes-1.11.1/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/gengo/examples/set-gen/sets/empty.go", "/home/abuild/rpmbuild/BUILD/kubernetes-1.11.1/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/gengo/examples/set-gen/sets/int.go", "/home/abuild/rpmbuild/BUILD/kubernetes-1.11.1/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/gengo/examples/set-gen/sets/int64.go", "/home/abuild/rpmbuild/BUILD/kubernetes-1.11.1/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/gengo/examples/set-gen/sets/string.go"],

so my guess is that a part of the build system generates explicit random buildids instead of using something reproducible (e.g. a constant or the shasum of the source(s))
Maybe go would even do the right thing (like gcc) when no buildid is given?

@dims
Copy link
Member Author

dims commented Oct 23, 2018

@bmwiedemann i could get reproducible builds with latest master. PR is here. here's what i had to do.

  • make quick-release builds stuff inside a docker container, so we control a lot of the things including go version, paths etc.

But that was not enough. then i had to add trimpath

And then added -s -w to remove the symbol table

and finally pass the SOURCE_DATE_EPOCH into the container where the builds happen.

Finally tested the build process under my laptop (MacOS) and ubuntu and verified the build id of the kubeadm binary.

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Oct 23, 2018
@dims
Copy link
Member Author

dims commented Oct 23, 2018

/sig release

@k8s-ci-robot k8s-ci-robot added sig/release Categorizes an issue or PR as relevant to SIG Release. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 23, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 21, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 20, 2019
@bmwiedemann
Copy link
Contributor

/remove-lifecycle rotten

We currently have 1.13.3 in openSUSE and I can see new variations of order:

--- old//usr/share/man/man1/kubeadm-init.1      2019-02-20 01:47:34.848765358 +0000
+++ new//usr/share/man/man1/kubeadm-init.1      2019-02-20 01:47:34.856765406 +0000
@@ -26,10 +26,10 @@
 kubelet\-start              Writes kubelet settings and (re)starts the kubelet
 certs                      Certificate generation
   /etcd\-ca                   Generates the self\-signed CA to provision identities for etcd
+  /etcd\-server               Generates the certificate for serving etcd
   /etcd\-peer                 Generates the credentials for etcd nodes to communicate with each other
   /etcd\-healthcheck\-client   Generates the client certificate for liveness probes to healtcheck etcd
   /apiserver\-etcd\-client     Generates the client apiserver uses to access etcd
-  /etcd\-server               Generates the certificate for serving etcd
   /ca                        Generates the self\-signed Kubernetes CA to provision identities for other Kubernetes components

Plus the issue with varying build-ids - is 1.13.3 too old to have fixes from @dims or do we need to change something in our .spec file?

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 21, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 22, 2019
@praseodym
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 22, 2019
@bmwiedemann
Copy link
Contributor

kubernetes-1.14.1 still has these order variations in /usr/share/man/man1/kubeadm-init.1

And there are also some binaries with variations

/usr/src/kubernetes/platforms/linux/amd64/e2e.test differs in ELF section .gopclntab
@@ -3121947,8 +3121947,8 @@
  85ddce0 3b35fa02 c234fa02 4a34fa02 d833fa02  ;5...4..J4...3..
  85ddcf0 6333fa02 f132fa02 b932fa02 7c32fa02  c3...2...2..|2..
  85ddd00 3a32fa02 f731fa02 ba31fa02 9031fa02  :2...1...1...1..
- 85ddd10 2f746d70 2f676f2d 6275696c 64323635  /tmp/go-build265
- 85ddd20 30303637 30342f62 3030312f 5f746573  006704/b001/_tes
+ 85ddd10 2f746d70 2f676f2d 6275696c 64383933  /tmp/go-build893
+ 85ddd20 34353433 34362f62 3030312f 5f746573  454346/b001/_tes
  85ddd30 746d6169 6e2e676f 00005f6f 75747075  tmain.go.._outpu
  85ddd40 742f6c6f 63616c2f 676f2f73 72632f6b  t/local/go/src/k
  85ddd50 38732e69 6f2f6b75 6265726e 65746573  8s.io/kubernetes

/usr/bin/kubeadm differs in ELF section .typelink
@@ -66,7 +66,7 @@
  1b41fd0 00eb0800 40eb0800 80eb0800 c0eb0800  ....@...........
  1b41fe0 00ec0800 80f00800 40ec0800 c0f00800  ........@.......
  1b41ff0 00ee0800 40ee0800 40c10800 80c10800  ....@...@.......
- 1b42000 c0ed0800 80ec0800 c0ec0800 80e50800  ................
+ 1b42000 80ec0800 c0ed0800 c0ec0800 80e50800  ................
  1b42010 00f10800 c0e50800 00ed0800 40ed0800  ............@...
  1b42020 c0e30800 00e40800 80ee0800 40e40800  ............@...
  1b42030 80e40800 c0e40800 40f20800 c0c10800  ........@.......
...

@dims
Copy link
Member Author

dims commented May 30, 2019

@bmwiedemann see #78544 for the /usr/share/man/man1/kubeadm-init.1 fix.

I haven't looked into the other two issues e2e.test with /tmp path and the kubeadm's .typelink section.

@smourapina
Copy link

Hello @dims and @saschagrunert!
Bug Triage team here for the 1.18 release. This is a friendly reminder that code freeze is scheduled for 5 March, which is about 3 weeks from now. Is this issue still intended for milestone 1.18? Thanks in advance!

@dims
Copy link
Member Author

dims commented Feb 12, 2020

/milestone clear

@smourapina clearing the milestone. thanks!

@k8s-ci-robot k8s-ci-robot removed this from the v1.18 milestone Feb 12, 2020
@bmwiedemann
Copy link
Contributor

I recently had a look at 1.17 results and there were still issues left, even when building with go-1.13.

/usr/bin/kube-apiserver differs in ELF section .typelink

/usr/bin/kube-scheduler differs in ELF section .note.go.buildid

@dims
Copy link
Member Author

dims commented May 3, 2020

@bmwiedemann I don't see a run for kubernetes here https://rb.zq1.de/compare.factory-20200430/

@bmwiedemann
Copy link
Contributor

The .out files there are only created for differing builds.
https://rb.zq1.de/compare.factory-20200430/reproducible.json lists kubernetes1.17 and 1.18 as reproducible.
However, for kubernetes I had to do both builds with -j1 because build failed otherwise. And usually, various go programs suffered from some race in generation of the .note.go.buildid ELF header like this . That is a -j1 and a -j4 build differed there.

@dims
Copy link
Member Author

dims commented May 4, 2020

Well, we have to celebrate that milestone, right? 🎈 🍰 !!! :)

ack on the -j1/-j4 will poke at it when i get a chance

@dims
Copy link
Member Author

dims commented May 4, 2020

@bmwiedemann please do throw me a link with output showing binaries that had trouble with -j1/-j4 (from the kubernetes release artifacts). I'd have to track the build apparatus for each one to do something similar to https://github.com/kubernetes/kubernetes/pull/89136/files

@bmwiedemann
Copy link
Contributor

It seems, #89136 fixed it for the kubernetes packages.
However, I have the feeling that go's build IDs were meant to be deterministic - because I never saw more than 2 different results when I varied levels of parallelism.
And instead of patching it out in every software, it would be good to understand and fix where that 1 bit of non-determinism comes from.

other packages I saw with this issue:

and a few more like minikube and katacontainers where it might be hidden behind other sources of non-determinism.

@dims
Copy link
Member Author

dims commented May 13, 2020

Ack. going to close this out. we can open some new issues when something else pops up.

/close

@k8s-ci-robot
Copy link
Contributor

@dims: Closing this issue.

In response to this:

Ack. going to close this out. we can open some new issues when something else pops up.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@BenTheElder
Copy link
Member

@dims should we stop zero-ing out the buildid now?

@dims
Copy link
Member Author

dims commented Nov 19, 2020

possibly @BenTheElder but we have to verify :(

@saschagrunert
Copy link
Member

@dims should we stop zero-ing out the buildid now?

May I ask why?

@dims
Copy link
Member Author

dims commented Apr 22, 2021

@saschagrunert apparently there are fixes in golang compiler itself to better compute the buildid, so we should check it out when we have time.

@BenTheElder
Copy link
Member

the buildid is a cache key, and the buildid should be reproducible now I think. but we need to confirm. I thought we'd checked that in KIND but it seems we're still zero-ing it.

@saschagrunert
Copy link
Member

Got it, did some research and proposed the change: #101411

@dims dims reopened this Oct 1, 2022
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Oct 1, 2022
@k8s-ci-robot
Copy link
Contributor

@dims: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dims
Copy link
Member Author

dims commented Jun 12, 2023

/close

please reopen if needed @bmwiedemann

@k8s-ci-robot
Copy link
Contributor

@dims: Closing this issue.

In response to this:

/close

please reopen if needed @bmwiedemann

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/release Categorizes an issue or PR as relevant to SIG Release.
Projects
None yet
Development

No branches or pull requests

9 participants