Skip to content

Fix InternalThreadLocalMap cpu cache sharing size to 128bytes #12309

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 25, 2022
Merged

Fix InternalThreadLocalMap cpu cache sharing size to 128bytes #12309

merged 4 commits into from
Apr 25, 2022

Conversation

arthur-zhang
Copy link
Contributor

Motivation:

in 4.0.36.Final UnpaddedInternalThreadLocalMap introduced a new field

    // ArrayList-related thread-locals
    ArrayList<Object> arrayList;

image

this field will break InternalThreadLocalMap's cache line padding from 128 bytes to 136

this is jol output with 4.0.36.Final

io.netty.util.internal.InternalThreadLocalMap object internals:
OFF  SZ                                       TYPE DESCRIPTION                                                    VALUE
  0   8                                            (object header: mark)                                          N/A
  8   4                                            (object header: class)                                         N/A
 12   4                                        int UnpaddedInternalThreadLocalMap.futureListenerStackDepth        N/A
 16   4                                        int UnpaddedInternalThreadLocalMap.localChannelReaderStackDepth    N/A
 20   4                         java.lang.Object[] UnpaddedInternalThreadLocalMap.indexedVariables                N/A
 24   4                              java.util.Map UnpaddedInternalThreadLocalMap.handlerSharableCache            N/A
 28   4       io.netty.util.internal.IntegerHolder UnpaddedInternalThreadLocalMap.counterHashCode                 N/A
 32   4   io.netty.util.internal.ThreadLocalRandom UnpaddedInternalThreadLocalMap.random                          N/A
 36   4                              java.util.Map UnpaddedInternalThreadLocalMap.typeParameterMatcherGetCache    N/A
 40   4                              java.util.Map UnpaddedInternalThreadLocalMap.typeParameterMatcherFindCache   N/A
 44   4                    java.lang.StringBuilder UnpaddedInternalThreadLocalMap.stringBuilder                   N/A
 48   4                              java.util.Map UnpaddedInternalThreadLocalMap.charsetEncoderCache             N/A
 52   4                              java.util.Map UnpaddedInternalThreadLocalMap.charsetDecoderCache             N/A
 56   4                        java.util.ArrayList UnpaddedInternalThreadLocalMap.arrayList                       N/A
 60   4                                            (alignment/padding gap)                                        
 64   8                                       long InternalThreadLocalMap.rp1                                     N/A
 72   8                                       long InternalThreadLocalMap.rp2                                     N/A
 80   8                                       long InternalThreadLocalMap.rp3                                     N/A
 88   8                                       long InternalThreadLocalMap.rp4                                     N/A
 96   8                                       long InternalThreadLocalMap.rp5                                     N/A
104   8                                       long InternalThreadLocalMap.rp6                                     N/A
112   8                                       long InternalThreadLocalMap.rp7                                     N/A
120   8                                       long InternalThreadLocalMap.rp8                                     N/A
128   8                                       long InternalThreadLocalMap.rp9                                     N/A
Instance size: 136 bytes
Space losses: 4 bytes internal + 0 bytes external = 4 bytes total

before 4.0.36.Final eg 4.0.35.Final, InternalThreadLocalMap output is :

io.netty.util.internal.InternalThreadLocalMap object internals:
OFF  SZ                                       TYPE DESCRIPTION                                                    VALUE
  0   8                                            (object header: mark)                                          N/A
  8   4                                            (object header: class)                                         N/A
 12   4                                        int UnpaddedInternalThreadLocalMap.futureListenerStackDepth        N/A
 16   4                                        int UnpaddedInternalThreadLocalMap.localChannelReaderStackDepth    N/A
 20   4                         java.lang.Object[] UnpaddedInternalThreadLocalMap.indexedVariables                N/A
 24   4                              java.util.Map UnpaddedInternalThreadLocalMap.handlerSharableCache            N/A
 28   4       io.netty.util.internal.IntegerHolder UnpaddedInternalThreadLocalMap.counterHashCode                 N/A
 32   4   io.netty.util.internal.ThreadLocalRandom UnpaddedInternalThreadLocalMap.random                          N/A
 36   4                              java.util.Map UnpaddedInternalThreadLocalMap.typeParameterMatcherGetCache    N/A
 40   4                              java.util.Map UnpaddedInternalThreadLocalMap.typeParameterMatcherFindCache   N/A
 44   4                    java.lang.StringBuilder UnpaddedInternalThreadLocalMap.stringBuilder                   N/A
 48   4                              java.util.Map UnpaddedInternalThreadLocalMap.charsetEncoderCache             N/A
 52   4                              java.util.Map UnpaddedInternalThreadLocalMap.charsetDecoderCache             N/A
 56   8                                       long InternalThreadLocalMap.rp1                                     N/A
 64   8                                       long InternalThreadLocalMap.rp2                                     N/A
 72   8                                       long InternalThreadLocalMap.rp3                                     N/A
 80   8                                       long InternalThreadLocalMap.rp4                                     N/A
 88   8                                       long InternalThreadLocalMap.rp5                                     N/A
 96   8                                       long InternalThreadLocalMap.rp6                                     N/A
104   8                                       long InternalThreadLocalMap.rp7                                     N/A
112   8                                       long InternalThreadLocalMap.rp8                                     N/A
120   8                                       long InternalThreadLocalMap.rp9                                     N/A
Instance size: 128 bytes
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total

in the latest version, InternalThreadLocalMap occupy 136bytes, break the purpose of 128 bytes align.

so i think it's better use 8 long to do the padding job, or remove all the paddings.

@chrisvest
Copy link
Member

@arthur-zhang japicmp also needs to be told about this change:

Error:  Failed to execute goal com.github.siom79.japicmp:japicmp-maven-plugin:0.15.4:cmp (default) on project netty-common: There is at least one incompatibility: io.netty.util.internal.InternalThreadLocalMap.rp9:FIELD_REMOVED -> [Help 1]

@normanmaurer
Copy link
Member

@arthur-zhang can you please sign our ICLA: https://netty.io/s/icla ?

@normanmaurer normanmaurer added this to the 4.1.77.Final milestone Apr 19, 2022
@franz1981
Copy link
Contributor

franz1981 commented Apr 19, 2022

@arthur-zhang

in the latest version, InternalThreadLocalMap occupy 136bytes, break the purpose of 128 bytes align.

I don't understand this: if the padding is too generous it shouldn't be a problem...the real problem is when it's below 2 cache lines

@arthur-zhang
Copy link
Contributor Author

arthur-zhang commented Apr 19, 2022

@arthur-zhang

in the latest version, InternalThreadLocalMap occupy 136bytes, break the purpose of 128 bytes align.

I don't understand this: if the padding is too generous it shouldn't be a problem...the real problem is when it's below 2 cache lines

yes, bigger than 128 is ok for cache line padding, but waste more space

@arthur-zhang
Copy link
Contributor Author

@arthur-zhang can you please sign our ICLA: https://netty.io/s/icla ?

done

@franz1981
Copy link
Contributor

thanks @arthur-zhang

break the purpose of 128 bytes align.

this has led me to believe it could have caused any perf issue (while it's not)

@arthur-zhang
Copy link
Contributor Author

thanks @arthur-zhang

break the purpose of 128 bytes align.

this has led me to believe it could have caused any perf issue (while it's not)

In this scenario,i think it's better remove all the rp* padding value to keep the class simple

@chrisvest chrisvest merged commit cf9dc40 into netty:4.1 Apr 25, 2022
raidyue pushed a commit to raidyue/netty that referenced this pull request Jul 8, 2022
…12309)

Motivation:

in 4.0.36.Final `UnpaddedInternalThreadLocalMap` introduced a new `ArrayList<Object> arrayList` field.
This field will break cache line padding of `InternalThreadLocalMap` from 128 bytes to 136.

Modification:
Remove one of the 8-byte padding fields.

Result:
The `InternalThreadLocalMap` objects are once again sized a multiple of 64-byte cache lines.

Co-authored-by: Your Name <you@example.com>
franz1981 pushed a commit to franz1981/netty that referenced this pull request Aug 22, 2022
…12309)

Motivation:

in 4.0.36.Final `UnpaddedInternalThreadLocalMap` introduced a new `ArrayList<Object> arrayList` field.
This field will break cache line padding of `InternalThreadLocalMap` from 128 bytes to 136.

Modification:
Remove one of the 8-byte padding fields.

Result:
The `InternalThreadLocalMap` objects are once again sized a multiple of 64-byte cache lines.

Co-authored-by: Your Name <you@example.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants