Skip to content

Unavailable error when MaxConnectionAge and MaxConnectionAgeGrace is enabled on the server #2767

Closed
@wjywbs

Description

@wjywbs
Contributor

What version of gRPC are you using?

v1.18.0/v1.19.1

What version of Go are you using (go version)?

1.11

What operating system (Linux, Windows, …) and version?

Linux

What did you do?

I tried to enable MaxConnectionAge on a C++ server and call from a Go client. When the connection age was reached, the rpcs in the Go client failed with the unavailable error and the "transport is closing" message.

What did you expect to see?

No unavailable errors.

What did you see instead?

I test with both age and grace time set to one minute, and the Go client reported lots of errors. Each rpc takes a few seconds to complete, within the one minute grace period.

grpc-go v1.18.0/v1.19.1
rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: <nil>
rpc error: code = Unavailable desc = the connection is draining
rpc error: code = Unavailable desc = transport is closing

Reference: grpc/grpc#12295 (comment)

Activity

menghanl

menghanl commented on Apr 12, 2019

@menghanl
Contributor

Can you turn on logging for the client and server?

For go client: GRPC_GO_LOG_VERBOSITY_LEVEL=99 GRPC_GO_LOG_SEVERITY_LEVEL=info

For C++ server: try GRPC_VERBOSITY=DEBUG GRPC_TRACE=server_channel

dfawley

dfawley commented on May 2, 2019

@dfawley
Member

ping @wjywbs - could you gather the debugging info requested and do you have a minimal reproducible test case?

wjywbs

wjywbs commented on May 2, 2019

@wjywbs
ContributorAuthor

Sure. I'll try to create a minimized demo.

easwars

easwars commented on May 2, 2019

@easwars
Contributor

I tried with a Go server and Go client (which makes one connection to the server and repeatedly makes RPCs on it).
When the MaxConnectionAge and MaxConnectionAgeGrace on the server were much greater than the time taken to complete the RPCs, I saw the following in the client logs:
I0502 14:33:32.238317 58768 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc0007e5360, TRANSIENT_FAILURE
I0502 14:33:32.238395 58768 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc0007e5360, CONNECTING
I0502 14:33:32.242678 58768 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc0007e5360, READY
and the client kept running without any issues.

But, when the MaxConnectionAge and MaxConnectionAgeGrace on the server was lesser than the time taken to complete the RPC, I saw the following on the client:
I0502 15:24:30.825472 83849 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc000475c00, TRANSIENT_FAILURE
I0502 15:24:30.825538 83849 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc000475c00, CONNECTING
I0502 15:24:30.829788 83849 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc000475c00, READY
F0502 15:24:30.835994 83849 fortuneteller_client.go:83] Error received while calling callTellFortune: rpc error: code = Unavailable desc = transport is closing

easwars

easwars commented on May 2, 2019

@easwars
Contributor

I think this is what is happening:

Since MaxConnectionAge and MaxConnectionAgeGrace have passed on the server, it force closes the connection.

And in the client in http2Client.reader(), the call to t.framer.fr.ReadFrame returns EOF, and since this is not of type http2.StreamError we call Close() on the transport, which closes all active streams with ErrConnClosing

At this point, I don't know enough to say whether this is a bug or is WAI :-)

dfawley

dfawley commented on May 2, 2019

@dfawley
Member

If MaxConnectionAgeGrace passes, then this would be expected. From the issue linked, I understood that the grace period was not being reached and errors were still appearing. @wjywbs - can you confirm this?

wjywbs

wjywbs commented on May 2, 2019

@wjywbs
ContributorAuthor

Yes, I set both age and grace to 60s and each rpc finishes within a few seconds, but the unavailable errors appear a lot of times.

dfawley

dfawley commented on May 23, 2019

@dfawley
Member

@wjywbs do you have a minimal reproducible test case we can use to replicate the problem you are seeing?

hanjm

hanjm commented on May 29, 2019

@hanjm

hi @dfawley @wjywbs, i set MaxConnectionAge for long live grpc stream server recently, then a lots of grpc log "transport is closing", so i found this issue and write a test case to replicate the problem. you may need use -count 100 because sometimes it test pass sometimes it test fail.

// 1. run case 1
// go test -run "TestKeepaliveMaxAge/case1-Fatal-Unavailable-transport is closing" -count 100
// result: test fail
//
package testcase1

import (
	"context"
	"google.golang.org/grpc"
	"google.golang.org/grpc/examples/route_guide/routeguide"
	"google.golang.org/grpc/grpclog"
	"google.golang.org/grpc/keepalive"
	"io"
	"log"
	"net"
	"net/http"
	"net/http/pprof"
	"os"
	"testing"
	"time"
)

type routeGuidService struct {
}

func (*routeGuidService) GetFeature(ctx context.Context, p *routeguide.Point) (*routeguide.Feature, error) {
	return &routeguide.Feature{}, nil
}

func (*routeGuidService) ListFeatures(p *routeguide.Rectangle, s routeguide.RouteGuide_ListFeaturesServer) error {
	return nil
}

func (*routeGuidService) RecordRoute(s routeguide.RouteGuide_RecordRouteServer) error {
	return nil
}

func (*routeGuidService) RouteChat(s routeguide.RouteGuide_RouteChatServer) error {
	go func() {
		err := s.Send(&routeguide.RouteNote{Message: "server send"})
		if err != nil {
			log.Printf("failed to send:%s", err)
		}
	}()
	for {
		_, err := s.Recv()
		if err != nil {
			if err == io.EOF {
				return nil
			}
			log.Printf("failed to recv:%s", err)
			return err
		}
	}
}

// go test -run "TestKeepaliveMaxAge/case1-Fatal-Unavailable-transport is closing" -count 100
func TestKeepaliveMaxAge(t *testing.T) {
	grpclog.SetLoggerV2(grpclog.NewLoggerV2WithVerbosity(os.Stderr, os.Stderr, os.Stderr, 10))
	keepAliveMaxConnectionAge := time.Second * 5

	// 1. star server
	s := grpc.NewServer(
		grpc.KeepaliveParams(keepalive.ServerParameters{
			MaxConnectionAge: keepAliveMaxConnectionAge,
		}),
	)
	routeguide.RegisterRouteGuideServer(s, &routeGuidService{})
	const addr = "127.0.0.1:8000"
	const pprofAddr = "127.0.0.1:8001"
	go func() {
		_ = pprof.Index
		err := http.ListenAndServe(pprofAddr, nil)
		if err != nil && err != http.ErrServerClosed {
			return
		}
	}()
	ln, err := net.Listen("tcp", addr)
	if err != nil {
		t.Fatal(err)
		return
	}
	go func() {
		err := s.Serve(ln)
		if err != nil {
			t.Error(err)
			return
		}
	}()
	defer func() {
		s.GracefulStop()
	}()
	// wait server start up
	time.Sleep(time.Millisecond * 100)
	// 2. start client
	cc, err := grpc.DialContext(context.Background(), addr, grpc.WithInsecure())
	if err != nil {
		t.Fatal(err)
		return
	}
	// go test -run "TestKeepaliveMaxAge/case1-Fatal-Unavailable-transport is closing" -count 100
	t.Run("case1-Fatal-Unavailable-transport is closing", func(t *testing.T) {
		timer := time.NewTimer(keepAliveMaxConnectionAge + time.Second)
		for {
			_, err := routeguide.NewRouteGuideClient(cc).GetFeature(context.Background(), &routeguide.Point{})
			if err != nil {
				// unexpected: its Fatal
				// must use grpc.FailFast(false), it's ok
				t.Fatal(err)
				return
			}
			select {
			case <-timer.C:
				return
			default:
			}
		}
	})
}
hanjm

hanjm commented on Jun 2, 2019

@hanjm

update. if i use grpc.FailFast(false) or grpc.WithDefaultCallOptions(grpc.FailFast(false)), it's will work fine.

dfawley

dfawley commented on Jun 20, 2019

@dfawley
Member

We suspect this may be fixed by #2857. Can you try with this change and let us know if it resolves your problems?

wjywbs

wjywbs commented on Jul 11, 2019

@wjywbs
ContributorAuthor

I tested v1.22.0, and there were still many "rpc error: code = Unavailable desc = the connection is draining". With grpc.WaitForReady(true), there were less errors and the descriptions were "the connection is draining" and "transport is closing".

dfawley

dfawley commented on Jul 11, 2019

@dfawley
Member

It seems this might be a problem with how we handle the race between choosing a transport and the transport getting the GOAWAY from the server. I'll take a look next week.

8 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    Participants

    @menghanl@easwars@wjywbs@hanjm@dfawley

    Issue actions

      Unavailable error when MaxConnectionAge and MaxConnectionAgeGrace is enabled on the server · Issue #2767 · grpc/grpc-go