Description
During our analysis to understand why Java 11 native images were bigger than Java 8 ones, we found that org.jcp.xml.dsig.internal.dom.ApacheCanonicalizer
is detected as used (even if a typical Spring Boot application does not use it), triggering com.sun.org.apache.xml.internal.security.Init.init()
static init which transitively includes a lot of dependencies.
The analysis done by @gradinac and @d-kozak has shown it is used at 2 levels.
In
com.oracle.svm.core.jdk.TrustStoreManagerFeature
, this feature ensures root certificates are baked into the image heap so that there is no need to somehow fetch them at runtime. The reason this class gets included is the way security provider loading is done in Java - since x509 certs are loaded, there is a need for a security provider that provides some security-related functionality. What happens is as follows (only the interesting part):
- An EC algorithm (
sun.security.x509.AlgorithmId#<init>
) is requested.GetInstance
tries to find that algorithm from the list of configured algorithms (Service firstService = list.getService(type, algorithm);
)- This list iterates over each provider listed in
<graalvm-home>/conf/security/java.security
and tries to find a provider that provides our requested service- However,
getProvider
will after some nested calls basically iterate over all security providers until it finds the one with the requested name.XMLDSig
provider is pulled, which also must be initialized at build time (which is done in theTrustStoreManagerFeature
).But even if the previous point is solved, it still gets loaded by the service loader as it's declared in a
module-info.java
file that's on the classpath. This is kind of similar to #2991.ServiceLoaderFeature
only controls what services will be reachable at image run time, but here we are experiencing issues withServiceLoader
at image build time, which is a different story.
ServiceLoaders
are internal jvm mechanism that does not provide an interface for such configuration. The lookup is very dynamic in nature. We essentially have two options here if we want to control its behaviour.
- Prevent the internal ServiceLoader iterators from locating the resources.
- Instrument ServiceLoader and insert a filter into it.
NativeImageBytecodeInstrumentationAgent
could be a way to filter the loaded service implementations, at the cost of a maintenance overhead. A hook onServiceLoader
here could be also a way to approach this.
Any thoughts?
Activity
webfolderio commentedon Feb 8, 2021
this is not a bug. All xml releated class must included.
sdeleuze commentedon Feb 16, 2021
No this is a real bug, and a complex one, confirmed with the GraalVM team.
gradinac commentedon Feb 18, 2021
This issue is also transient, in a sense that sometimes the XML classes may or may not be included in the final image.
The core issue lies in the observed "randomness" of the order in which the
ServiceProvider
s are iterated over. The security related code uses an iterator provided by theServiceLoader
to load and find the exact security provider it needs. Looking at theServiceLoader#iterator()
implementation here, we can see that it returns an iterator decorated to leverage theServiceLoader
's cache. This underlying iterator is created here.From the implementation, we can conclude that it basically iterates over both services provided by modules and services provided from scanning the classpath, with the key takeaway being that the services provided by modules are preferred. The security related services are indeed specified in the
module-info
files of the different JDK modules, and will be the ones picked up during a native-image build.Inspecting the
ModuleServicesLookupIterator
, we can see that it will add services provided by different module layers in a List that will later on be used for iteration. This list is populated fromServicesCatalog
s of different module layers. The module layer we are interested in is the boot layer - it contains the JDK modules that provide security services (i.e.java.xml.crypto
).Here is where things get interesting: The services are obtained from the
ServicesCatalog
usingfindServices
, defined here. We can see that the services are kept in a map that maps aService
to a list ofServiceProvider
s (a pair ofModule
andServiceProviderName
). Under the hood, this list is actually aCopyOnWriteList
to allow parallel modification. It is this list in the end that contains the service providers in a seemingly random order.While I have not been able to trace the exact cause of this randomness yet, I have verified that sorting this list gives us a stable service provider iteration order. One possible source of randomness could happen when querying the
ServicesCatalog
of a module layer for the first time (here). We iterate over the values of a HashMap that contains all of the modules in the layer. If the iteration order is unstable, the service provider order would be as well.sdeleuze commentedon Feb 22, 2021
@maxandersen As discussed during last GraalVM advisory board, you can find above more details about that footprint issue that is likely going to require JDK changes.
gradinac commentedon Feb 23, 2021
Update: After investigating this further, it seems like the culprit is the
ServicesCatalog
of the platform class loader. This catalog is what, in our case, actually gets queried for the security providers. Here's a sample app that demonstrates this:HelloServiceWorld.java
Compile with:
javac HelloServiceWorld
Run a couple of times with:
java --add-opens java.base/jdk.internal.module=ALL-UNNAMED HelloServiceWorld
Sample output:
From the above output, we can see that, for the same Java app launched with the same command line, we get a different orders of service providers provided by the platform class loader.
The platform class loader's
ServicesCatalog
is populated very early during the VM initialization process, duringSystem.initPhase2
- the initialization of the module system and the creation of the boot layer. It starts with a call to define modules. In turn, that will try and define the boot layer modules with a call to Module.defineModules. Here, we iterate over the provided configuration and, based on each module's class loader, populate either the boot or the platform class loader'sServicesCatalog
. The module iteration order here is not stable across runs, resulting in the seemingly random output from the application above.As the next step, I will try to find the source if this instability
gradinac commentedon Mar 10, 2021
The source of randomness has come from structures with undefined order of iteration (HashMaps and set returned by Set.of), as well as maps returned by a module finder class generated by jlink, used by the JDK during the module system initialization phase.
@cstancu and I revisited this issue and it has been resolved in a different approach here: f3629c3
Rather than try to fix the order in the JDK which would be very fragile, we took a different approach: as all security providers are added and initialized at build time, we should never need to load a new provider at runtime. We've substituted the method that caused the provider cache to be reachable as well as reset the cache's contents. This, in turn, means no seemingly random providers should be reachable at runtime.
maxandersen commentedon Mar 10, 2021
Btw. I also figured out why we dont see this issue in quarkus - reason is we as you describe here disable the service catalog and just avoids the whole problem as it happens at build time.
sdeleuze commentedon Apr 20, 2021
This issue fixed the source of randomness, but XML classes still get included with GraalVM 21.1, see #3365 for more details.
[-]Footprint optimization: avoid XML classes being included by default on Java 11+[/-][+]Make image size stable across builds on Java 11+[/+]gilles-duboscq commentedon Jun 15, 2021
This is still open on the 21.1 milestone, should it be moved to another milestone?
gradinac commentedon Jun 15, 2021
I can move it to 21.2, this should be fixed by 10cc640 but I'll keep this open until we're 100% sure it fixed the regression
sdeleuze commentedon Aug 22, 2021
@gradinac I am surprised because because mid June we tested that the regression was fixed, and testing right now our
commandlinerunner
sample with GraalVM 21.2.0, I see that all the crypto classes are back.I did a test with what you mentioned in this comment, and the order is not stable again.
First run:
Second run:
To be sure I run the right Java version:
I can't understand why that worked mid June and why now there seems to have another regression on this, leading to bigger than needed Java 11 native images.
sdeleuze commentedon Aug 23, 2021
After a deeper look with @gradinac, it seems to work as expected, the crypto classes were included because it was useby some features like random number generator. The flag
-H:+TraceSecurityServices
really helps to understand what happen and why.As a consequence I close this issue.