Skip to content

Make image size stable across builds on Java 11+ #3163

Closed
@sdeleuze

Description

@sdeleuze
Collaborator

During our analysis to understand why Java 11 native images were bigger than Java 8 ones, we found that org.jcp.xml.dsig.internal.dom.ApacheCanonicalizer is detected as used (even if a typical Spring Boot application does not use it), triggering com.sun.org.apache.xml.internal.security.Init.init() static init which transitively includes a lot of dependencies.

The analysis done by @gradinac and @d-kozak has shown it is used at 2 levels.

In com.oracle.svm.core.jdk.TrustStoreManagerFeature, this feature ensures root certificates are baked into the image heap so that there is no need to somehow fetch them at runtime. The reason this class gets included is the way security provider loading is done in Java - since x509 certs are loaded, there is a need for a security provider that provides some security-related functionality. What happens is as follows (only the interesting part):

  • An EC algorithm (sun.security.x509.AlgorithmId#<init>) is requested.
  • GetInstance tries to find that algorithm from the list of configured algorithms (Service firstService = list.getService(type, algorithm);)
  • This list iterates over each provider listed in <graalvm-home>/conf/security/java.security and tries to find a provider that provides our requested service
  • However, getProvider will after some nested calls basically iterate over all security providers until it finds the one with the requested name. XMLDSig provider is pulled, which also must be initialized at build time (which is done in the TrustStoreManagerFeature).

But even if the previous point is solved, it still gets loaded by the service loader as it's declared in a module-info.java file that's on the classpath. This is kind of similar to #2991. ServiceLoaderFeature only controls what services will be reachable at image run time, but here we are experiencing issues with ServiceLoader at image build time, which is a different story.
ServiceLoaders are internal jvm mechanism that does not provide an interface for such configuration. The lookup is very dynamic in nature. We essentially have two options here if we want to control its behaviour.

  • Prevent the internal ServiceLoader iterators from locating the resources.
  • Instrument ServiceLoader and insert a filter into it.

NativeImageBytecodeInstrumentationAgent could be a way to filter the loaded service implementations, at the cost of a maintenance overhead. A hook on ServiceLoader here could be also a way to approach this.

Any thoughts?

Activity

webfolderio

webfolderio commented on Feb 8, 2021

@webfolderio

this is not a bug. All xml releated class must included.

sdeleuze

sdeleuze commented on Feb 16, 2021

@sdeleuze
CollaboratorAuthor

No this is a real bug, and a complex one, confirmed with the GraalVM team.

gradinac

gradinac commented on Feb 18, 2021

@gradinac
Contributor

This issue is also transient, in a sense that sometimes the XML classes may or may not be included in the final image.

The core issue lies in the observed "randomness" of the order in which the ServiceProviders are iterated over. The security related code uses an iterator provided by the ServiceLoader to load and find the exact security provider it needs. Looking at the ServiceLoader#iterator() implementation here, we can see that it returns an iterator decorated to leverage the ServiceLoader's cache. This underlying iterator is created here.

From the implementation, we can conclude that it basically iterates over both services provided by modules and services provided from scanning the classpath, with the key takeaway being that the services provided by modules are preferred. The security related services are indeed specified in the module-info files of the different JDK modules, and will be the ones picked up during a native-image build.

Inspecting the ModuleServicesLookupIterator, we can see that it will add services provided by different module layers in a List that will later on be used for iteration. This list is populated from ServicesCatalogs of different module layers. The module layer we are interested in is the boot layer - it contains the JDK modules that provide security services (i.e. java.xml.crypto).

Here is where things get interesting: The services are obtained from the ServicesCatalog using findServices, defined here. We can see that the services are kept in a map that maps a Service to a list of ServiceProviders (a pair of Module and ServiceProviderName). Under the hood, this list is actually a CopyOnWriteList to allow parallel modification. It is this list in the end that contains the service providers in a seemingly random order.

While I have not been able to trace the exact cause of this randomness yet, I have verified that sorting this list gives us a stable service provider iteration order. One possible source of randomness could happen when querying the ServicesCatalog of a module layer for the first time (here). We iterate over the values of a HashMap that contains all of the modules in the layer. If the iteration order is unstable, the service provider order would be as well.

sdeleuze

sdeleuze commented on Feb 22, 2021

@sdeleuze
CollaboratorAuthor

@maxandersen As discussed during last GraalVM advisory board, you can find above more details about that footprint issue that is likely going to require JDK changes.

gradinac

gradinac commented on Feb 23, 2021

@gradinac
Contributor

Update: After investigating this further, it seems like the culprit is the ServicesCatalog of the platform class loader. This catalog is what, in our case, actually gets queried for the security providers. Here's a sample app that demonstrates this:

HelloServiceWorld.java

import java.lang.reflect.Method;
import java.util.List;

public class HelloServiceWorld {
    public static void main(String[] args) {
        System.out.println("Hello, Service World!");
        try {
            /* Must be fetched using reflection as jdk.internal.module is not exported by java.base */
            Class<?> servicesCatalog = Class.forName("jdk.internal.module.ServicesCatalog");
            Method getServicesCatalogOrNull = servicesCatalog.getMethod("getServicesCatalogOrNull", ClassLoader.class);
            Method findServices = servicesCatalog.getMethod("findServices", String.class);

            Class<?> serviceProvider = Class.forName("jdk.internal.module.ServicesCatalog$ServiceProvider");
            Method providerName = serviceProvider.getMethod("providerName");

            Object serviceCatalog = getServicesCatalogOrNull.invoke(null, ClassLoader.getPlatformClassLoader());
            List<Object> services = (List<Object>) findServices.invoke(serviceCatalog, "java.security.Provider");
            System.out.println("Service providers for java.security.Provider, provided by the platform classloader:");
            for (Object service: services) {
                System.out.println(providerName.invoke(service));
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Compile with: javac HelloServiceWorld
Run a couple of times with: java --add-opens java.base/jdk.internal.module=ALL-UNNAMED HelloServiceWorld

Sample output:

$ java --add-opens java.base/jdk.internal.module=ALL-UNNAMED HelloServiceWorld
Hello, Service World!
Service providers for java.security.Provider, provided by the platform classloader:
sun.security.jgss.SunProvider
sun.security.smartcardio.SunPCSC
sun.security.ec.SunEC
com.sun.security.sasl.gsskerb.JdkSASL
sun.security.pkcs11.SunPKCS11
org.jcp.xml.dsig.internal.dom.XMLDSigRI
$ java --add-opens java.base/jdk.internal.module=ALL-UNNAMED HelloServiceWorld
Hello, Service World!
Service providers for java.security.Provider, provided by the platform classloader:
com.sun.security.sasl.gsskerb.JdkSASL
sun.security.pkcs11.SunPKCS11
org.jcp.xml.dsig.internal.dom.XMLDSigRI
sun.security.jgss.SunProvider
sun.security.smartcardio.SunPCSC
sun.security.ec.SunEC

From the above output, we can see that, for the same Java app launched with the same command line, we get a different orders of service providers provided by the platform class loader.

The platform class loader's ServicesCatalog is populated very early during the VM initialization process, during System.initPhase2 - the initialization of the module system and the creation of the boot layer. It starts with a call to define modules. In turn, that will try and define the boot layer modules with a call to Module.defineModules. Here, we iterate over the provided configuration and, based on each module's class loader, populate either the boot or the platform class loader's ServicesCatalog. The module iteration order here is not stable across runs, resulting in the seemingly random output from the application above.

As the next step, I will try to find the source if this instability

added this to the 21.1 milestone on Mar 10, 2021
gradinac

gradinac commented on Mar 10, 2021

@gradinac
Contributor

The source of randomness has come from structures with undefined order of iteration (HashMaps and set returned by Set.of), as well as maps returned by a module finder class generated by jlink, used by the JDK during the module system initialization phase.

@cstancu and I revisited this issue and it has been resolved in a different approach here: f3629c3
Rather than try to fix the order in the JDK which would be very fragile, we took a different approach: as all security providers are added and initialized at build time, we should never need to load a new provider at runtime. We've substituted the method that caused the provider cache to be reachable as well as reset the cache's contents. This, in turn, means no seemingly random providers should be reachable at runtime.

maxandersen

maxandersen commented on Mar 10, 2021

@maxandersen

Btw. I also figured out why we dont see this issue in quarkus - reason is we as you describe here disable the service catalog and just avoids the whole problem as it happens at build time.

sdeleuze

sdeleuze commented on Apr 20, 2021

@sdeleuze
CollaboratorAuthor

This issue fixed the source of randomness, but XML classes still get included with GraalVM 21.1, see #3365 for more details.

changed the title [-]Footprint optimization: avoid XML classes being included by default on Java 11+[/-] [+]Make image size stable across builds on Java 11+[/+] on Apr 20, 2021
gilles-duboscq

gilles-duboscq commented on Jun 15, 2021

@gilles-duboscq
Member

This is still open on the 21.1 milestone, should it be moved to another milestone?

gradinac

gradinac commented on Jun 15, 2021

@gradinac
Contributor

I can move it to 21.2, this should be fixed by 10cc640 but I'll keep this open until we're 100% sure it fixed the regression

modified the milestones: 21.1, 21.2 on Jun 15, 2021
sdeleuze

sdeleuze commented on Aug 22, 2021

@sdeleuze
CollaboratorAuthor

@gradinac I am surprised because because mid June we tested that the regression was fixed, and testing right now our commandlinerunner sample with GraalVM 21.2.0, I see that all the crypto classes are back.

I did a test with what you mentioned in this comment, and the order is not stable again.

First run:

Service providers for java.security.Provider, provided by the platform classloader:
sun.security.ec.SunEC
sun.security.smartcardio.SunPCSC
org.jcp.xml.dsig.internal.dom.XMLDSigRI
com.sun.security.sasl.gsskerb.JdkSASL
sun.security.jgss.SunProvider
sun.security.pkcs11.SunPKCS11

Second run:

Service providers for java.security.Provider, provided by the platform classloader:
sun.security.pkcs11.SunPKCS11
sun.security.jgss.SunProvider
com.sun.security.sasl.gsskerb.JdkSASL
org.jcp.xml.dsig.internal.dom.XMLDSigRI
sun.security.smartcardio.SunPCSC
sun.security.ec.SunEC

To be sure I run the right Java version:

java -version
openjdk version "11.0.12" 2021-07-20
OpenJDK Runtime Environment GraalVM CE 21.2.0 (build 11.0.12+6-jvmci-21.2-b08)
OpenJDK 64-Bit Server VM GraalVM CE 21.2.0 (build 11.0.12+6-jvmci-21.2-b08, mixed mode, sharing)

I can't understand why that worked mid June and why now there seems to have another regression on this, leading to bigger than needed Java 11 native images.

sdeleuze

sdeleuze commented on Aug 23, 2021

@sdeleuze
CollaboratorAuthor

After a deeper look with @gradinac, it seems to work as expected, the crypto classes were included because it was useby some features like random number generator. The flag -H:+TraceSecurityServices really helps to understand what happen and why.

As a consequence I close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @maxandersen@sdeleuze@gilles-duboscq@shelajev@munishchouhan

      Issue actions

        Make image size stable across builds on Java 11+ · Issue #3163 · oracle/graal