Skip to content

WebFlux with Java 11 HttpClient unexpected slow performance comparing with WebClient #22333

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Aleksandr-Filichkin opened this issue Feb 1, 2019 · 10 comments
Labels
in: web Issues in web modules (web, webmvc, webflux, websocket) status: invalid An issue that we don't feel is valid

Comments

@Aleksandr-Filichkin
Copy link

Aleksandr-Filichkin commented Feb 1, 2019

Environment: Spring Boot 2.1.2.RELEASE, Java 11(OpenJDK/Oracle)

So, I have RestConrtoller that sends an incoming request to another Rest service and returns the result back to clients.

So I compared WebClient with Java 11 HttpClient and I see an unexpected slow performance (looks like due to high GC usage) for Java 11 HttpClient.

Jmeter shows that with Java 11 HttpClient we have in 2 times less throughput than with WebClient. The problem cannot be in Java HttpClient because I tested the same stuff with Spring MVC and it has the same performance as with WebClient.

All code you can see here https://github.com/Aleksandr-Filichkin/spring-mvc-vs-webflux.
JMeter files for the test also are added.

I think the problem in memory because I see high GC usage for HttpClient comparing with WebClient.
image

So getUserUsingWithCF with Spring MVC works in two times faster than getUserUsingWebfluxJavaHttpClient with WebFlux

  @GetMapping(value = "/completable-future")
    public CompletableFuture<String> getUserUsingWithCF(@RequestParam long delay) {
        return sendRequestWithHttpClient(delay).thenApply(x -> "completable-future: " + x);
    }

    @GetMapping(value = "/webflux-java-http-client")
    public Mono<String> getUserUsingWebfluxJavaHttpClient(@RequestParam long delay) {
        CompletableFuture<String> stringCompletableFuture = sendRequestWithHttpClient(delay).thenApply(x -> "webflux-java-http-client: " + x);
        return Mono.fromFuture(stringCompletableFuture);
    }

image

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Feb 1, 2019
@rstoyanchev
Copy link
Contributor

rstoyanchev commented Feb 4, 2019

Are you doing all this on localhost? That skews the results quite a bit since client and server compete for the same hardware and CPU cycles, but there is a limited number of cores.

In the case of WebClient both server and client likely share resources so the number of threads is probably closer to the number of cores for the machine as it should be with event loop concurrency. For HttpClient it's allocating 10 threads which I'm guessing is much higher than number of cores and that's in addition to the ones used for the server. Try lowering the number of threads closer to the number of cores to confirm.

@rstoyanchev rstoyanchev added the in: web Issues in web modules (web, webmvc, webflux, websocket) label Feb 4, 2019
@Aleksandr-Filichkin
Copy link
Author

Aleksandr-Filichkin commented Feb 4, 2019

Hi @rstoyanchev
I tested it in AWS.
It's EC2 t2.micro Ubuntu 11 Java. It forwards requests to another EC2 box(user service).

This is the memory with single thread for Java Http Client.
HttpClient.newBuilder().executor(Executors.newFixedThreadPool(1)).build();

 @GetMapping(value = "/webflux-java-http-client")
    public Mono<String> getUserUsingWebfluxJavaHttpClient(@RequestParam long delay) {
        CompletableFuture<String> stringCompletableFuture = sendRequestWithHttpClient(delay).thenApply(x -> "webflux-java-http-client: " + x);
        return Mono.fromFuture(stringCompletableFuture);
    }

image

For WebClient

@GetMapping(value = "/webflux-webclient")
  public Mono<String> getUserUsingWebfluxWebclient(@RequestParam long delay) {
      return webClient.get().uri("/user/?delay={delay}", delay).retrieve().bodyToMono(String.class).map(x -> "webflux-webclient: " + x);
  }

image

For Tomcat and Java HttpClient

@GetMapping(value = "/completable-future")
    public CompletableFuture<String> getUserUsingWithCF(@RequestParam long delay) {
        return sendRequestWithHttpClient(delay).thenApply(x -> "completable-future: " + x);
    }

image

So for Tomcat and Java HttpClient GC takes 5-10%CPU
for WebFLux and WebClient GC takes ~5%CPU,
Webflux and Java HttpClient ~35%CPU

@Aleksandr-Filichkin
Copy link
Author

Tested with Apache Http Client and it works perfect(the same performance as with WebClient).

@bclozel
Copy link
Member

bclozel commented Feb 6, 2019

I've tried to locally profile your sample application with Yourkit - doing that locally is flawed for sure, but I'd expect to find similar things still.

I did not reproduce several of your findings.

  1. MVC is faster than WebFlux with the JDK11 client

So getUserUsingWithCF with Spring MVC works in two times faster than getUserUsingWebfluxJavaHttpClient with WebFlux

WebFlux is still 40% faster than Spring MVC in my local benchmark.

  1. the GC is taking significant CPU time

In my local benchmark, GC CPU time is almost flatlining at 0-1% for all variants.

I do see that the java.net.http.HttpClient variant is allocating way more objects, especially a 16K HeapByteBuffer for each connection (apparently you can tune that default with the "jdk.httpclient.bufsize" system property). I don't know much about the behavior of the connection pool there, nor if/how heap buffers are reused.

As seen in the jdk.internal.net.http.SocketTube sources:

// An implementation of BufferSource used for unencrypted data.
// This buffer source uses heap buffers and avoids wasting memory
// by forwarding read-only buffer slices downstream.
// Buffers allocated through this source are simply GC'ed when
// they are no longer referenced.

My current understanding is that clients have different strategies when it comes to buffer allocation, and some of those might behave better than others under high load or high concurrency. WebClient, when powered by reactor-netty, is sharing resources with the server so this might make things more efficient.

It looks like "EC2 t2.micro" instances have 1GB of memory and only one vCPU - I guess a lot of allocations on the heap and a single CPU is not helping the GC threads and those are competing with the application. Also, this might be even pushed further if JMeter is creating a lot of concurrent connections - this will push the HttpClient to create more connections in the pool, and allocate even more buffers.

With that in mind, I don't think we can do anything in Spring Framework to improve this situation; maybe you could try and run those benchmarks on servers with a bit more resources?

@bclozel bclozel added the status: waiting-for-feedback We need additional information before we can continue label Feb 6, 2019
@Aleksandr-Filichkin
Copy link
Author

Aleksandr-Filichkin commented Feb 7, 2019

Hi @bclozel, are you sure that you tested it with Tomcat?

I've updated the sample project (added maven profiles for servlet and webflux and introduced Apache Http Client for comparison). You are right, EC2 t2.micro has 1 CPU. And I understand the GC problem for single core node. But the question why with Tomcat and servlet we don't have GC issue?

So I prepared an article about it. You can see all the information here. https://medium.com/@filia.aleks/microservice-performance-battle-spring-mvc-vs-webflux-80d39fd81bf0
image

Locally on i7 Core, I also don't see the problem, only for small ec2 node. Do you have ability to test it for single core node?

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Feb 7, 2019
@bclozel
Copy link
Member

bclozel commented Feb 7, 2019

I find the 3 conclusions at the end of your article quite misleading.

Java 11 Http Client slower than Apache Http client (~30% performance degradation)
Spring WebClient has the same performance as Apache Http Client(both use netty)

This might be true in your case but your article should make obvious that you're running on a single core, 1GB RAM server instance. This is very unusual for a performance benchmark and skews the results. This is interesting because it shows how systems behave wrt concurrency with limited resources - but it's almost pathological.

WebFlux is not friendly with Java 11 Http Client

At this point, we don't know why you're seeing such results. So this sentence could read "Java 11 Http Client is not friendly with Netty" or "Java 11 Http Client doesn't work well with non-blocking servers", or "this combination of runtime models doesn't work well when you only have one core and little RAM".

Also, your benchmark is not showing latency stats - with limited resources, the 99% percentile might be a disaster and optimizing for throughput might be misguided here.

I'm closing this issue now as it seems this is more about the benchmark setup and runtime models rather than a performance problem in WebFlux.

Thanks!

@bclozel bclozel closed this as completed Feb 7, 2019
@bclozel bclozel added status: invalid An issue that we don't feel is valid and removed status: feedback-provided Feedback has been provided status: waiting-for-triage An issue we've not yet triaged or decided on labels Feb 7, 2019
@Aleksandr-Filichkin
Copy link
Author

Hi @bclozel. I agree with your comments. I see that it's not a Spring problem. But 1 CPU and 1 GB is a quite popular setup for dockerized microservices on Kubernetes and AWS ECS cluster. That is why I made benchmarks for this setup.

Thank you!

@rstoyanchev
Copy link
Contributor

The tests above are running with delay=1 which is treated as 1 millisecond, or am I missing something? This is also unusual. In a realistic app the latency would be higher, and it's latency that brings out the benefits of asynchronous handling.

Brian provided a lot of pointers for further investigation on the JDK 11 client. Specifically I would dig into the buffer allocation strategies and understand the options and trade-offs since you're suspecting allocation issues. I would also suggest making client benchmark tests independent of Spring MVC of Webflux and compare the performance of JDK 11 client to the WebClient or the Apache client, and if necessary use the appropriate Java mailing list to ask questions.

@Aleksandr-Filichkin
Copy link
Author

Aleksandr-Filichkin commented Feb 7, 2019

@rstoyanchev , you are right delay=1 means 1ms delay. But as you can see here https://user-images.githubusercontent.com/10498901/52384210-86b0cd80-2a8d-11e9-825e-5307957eba0b.png
I tested for 10, 100 and 500 ms. Yes, I have a plan to test JDK 11 client for EC2 micro without Web and Spring

@kivan-mih
Copy link

I have switched from OkHTTP 3.x to Java 15 http cllient and performance decreased significantly. My use case involves downloading many (~50000) average sized binary files (100s kb's to 10's megabytes) from the https in parallel threads. The performance degradation is about 50%. Does anyone meet such problems?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: web Issues in web modules (web, webmvc, webflux, websocket) status: invalid An issue that we don't feel is valid
Projects
None yet
Development

No branches or pull requests

5 participants