sync: RWMutex scales poorly with CPU count

On a machine with many cores, the performance of `sync.RWMutex.R{Lock,Unlock}` degrades dramatically as `GOMAXPROCS` increases.

This test program:

```go
package benchmarks_test

import (
	"fmt"
	"sync"
	"testing"
)

func BenchmarkRWMutex(b *testing.B) {
	for ng := 1; ng <= 256; ng <<= 2 {
		b.Run(fmt.Sprint(ng), func(b *testing.B) {
			var mu sync.RWMutex
			mu.Lock()

			var wg sync.WaitGroup
			wg.Add(ng)

			n := b.N
			quota := n / ng

			for g := ng; g > 0; g-- {
				if g == 1 {
					quota = n
				}

				go func(quota int) {
					for i := 0; i < quota; i++ {
						mu.RLock()
						mu.RUnlock()
					}
					wg.Done()
				}(quota)

				n -= quota
			}

			if n != 0 {
				b.Fatalf("Incorrect quota assignments: %v remaining", n)
			}

			b.StartTimer()
			mu.Unlock()
			wg.Wait()
			b.StopTimer()
		})
	}
}
```

degrades by a factor of 8x as it saturates threads and cores, presumably due to cache contention on &rw.readerCount:

```
# ./benchmarks.test -test.bench . -test.cpu 1,4,16,64
testing: warning: no tests to run
BenchmarkRWMutex/1      20000000                72.6 ns/op
BenchmarkRWMutex/1-4    20000000                72.4 ns/op
BenchmarkRWMutex/1-16   20000000                72.8 ns/op
BenchmarkRWMutex/1-64   20000000                72.5 ns/op
BenchmarkRWMutex/4      20000000                72.6 ns/op
BenchmarkRWMutex/4-4    20000000               105 ns/op
BenchmarkRWMutex/4-16   10000000               130 ns/op
BenchmarkRWMutex/4-64   20000000               160 ns/op
BenchmarkRWMutex/16     20000000                72.4 ns/op
BenchmarkRWMutex/16-4   10000000               125 ns/op
BenchmarkRWMutex/16-16  10000000               263 ns/op
BenchmarkRWMutex/16-64   5000000               287 ns/op
BenchmarkRWMutex/64     20000000                72.6 ns/op
BenchmarkRWMutex/64-4   10000000               137 ns/op
BenchmarkRWMutex/64-16   5000000               306 ns/op
BenchmarkRWMutex/64-64   3000000               517 ns/op
BenchmarkRWMutex/256                    20000000                72.4 ns/op
BenchmarkRWMutex/256-4                  20000000               137 ns/op
BenchmarkRWMutex/256-16                  5000000               280 ns/op
BenchmarkRWMutex/256-64                  3000000               602 ns/op
PASS
```

A "control" test, calling a no-op function instead of `RWMutex` methods, displays no such degradation: the problem does not appear to be due to runtime scheduling overhead.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sync: RWMutex scales poorly with CPU count #17973

89 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

sync: RWMutex scales poorly with CPU count #17973

Description

Activity

josharian commented on Nov 18, 2016

josharian commented on Nov 18, 2016

ianlancetaylor commented on Nov 18, 2016

dvyukov commented on Nov 18, 2016

ianlancetaylor commented on Nov 18, 2016

dvyukov commented on Nov 18, 2016

bcmills commented on Nov 18, 2016

bcmills commented on Nov 18, 2016

dvyukov commented on Nov 18, 2016

bcmills commented on Nov 18, 2016

dvyukov commented on Nov 18, 2016

ianlancetaylor commented on Nov 18, 2016

dvyukov commented on Nov 18, 2016

bcmills commented on Nov 18, 2016

ianlancetaylor commented on Nov 18, 2016

89 remaining items

ntsd commented on Dec 22, 2023

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions