New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(compiler): optimize computation of i18n message ids #39694
Conversation
713eb2e
to
3b25a62
Compare
Message ID computation makes extensive use of big integer multiplications in order to translate the message's fingerprint into a numerical representation. In large compilations with heavy use of i18n this was showing up high in profiler sessions. There are two factors contributing to the bottleneck: 1. a suboptimal big integer representation using strings, which requires repeated allocation and conversion from a character to numeric digits and back. 2. repeated computation of the necessary base-256 exponents and their multiplication factors. The first bottleneck is addressed using a representation that uses an array of individual digits. This avoids repeated conversion and allocation overhead is also greatly reduced, as adding two big integers can now be done in-place with virtually no memory allocations. The second point is addressed by a memoized exponentiation pool to optimize the multiplication of a base-256 exponent. As an additional optimization are the two 32-bit words now converted to decimal per word, instead of going through an intermediate byte buffer and doing the decimal conversion per byte. The results of these optimizations depend a lot on the number of i18n messages for which a message should be computed. Benchmarks have shown that computing message IDs is now ~6x faster for 1,000 messages, ~14x faster for 10,000 messages, and ~24x faster for 100,000 messages.
3b25a62
to
97f73b5
Compare
cc5d964
to
f09002e
Compare
f09002e
to
3490e9f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice performance gains.
I find the names of the freestanding functions quite confusing, like numberTimesBigInt() and
addNumberTimesBigInt()`. (Although I accept that these are mostly inherited from the previous implementation.) Perhaps we could make these (static/instance) methods on new classes?
Do you think you could add a few unit tests for some of this code while you are in here?
packages/compiler/src/i18n/digest.ts
Outdated
/** | ||
* Computes and memoizes the big integer value for `this.number * 2^exponent`. | ||
*/ | ||
getMultipliedByPowerOfTwo(exponent: number): BigInteger { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BIKESHED:
getMultipliedByPowerOfTwo(exponent: number): BigInteger { | |
multiplyByPowerOfTwo(exponent: number): BigInteger { |
3490e9f
to
149aaf4
Compare
149aaf4
to
8003a69
Compare
8003a69
to
cc42b73
Compare
The result of utf-8 encoding a string was represented in a string, where each individual character represented a single byte according to its character code. All usages of this data were interested in the byte itself, so this required conversion from a character back to its code. This commit simply stores the individual bytes in array to avoid the conversion. This yields a ~10% performance improvement for i18n message ID computation.
cc42b73
to
a27ede2
Compare
Note to Caretaker: I'd consider this change to have a medium risk (i.e. not a low-risk one), please sync it as a standalone change to g3. Thank you. |
FYI, adding the "blocked" label for now to run additional tests in g3. While regular presubmit was successful, testing i18n extraction is not common, so there might not be sufficient validation that the change is backwards-compatible. I will run extra checks manually to verify that extracted messages retain the same ids with this change. Thank you. |
Verified message id generation by comparing output for one of the internal apps. Everything looks good 👍 Note: it'd still make sense to sync this change into g3 as a separate/standalone CL. |
) The result of utf-8 encoding a string was represented in a string, where each individual character represented a single byte according to its character code. All usages of this data were interested in the byte itself, so this required conversion from a character back to its code. This commit simply stores the individual bytes in array to avoid the conversion. This yields a ~10% performance improvement for i18n message ID computation. PR Close #39694
Message ID computation makes extensive use of big integer multiplications in order to translate the message's fingerprint into a numerical representation. In large compilations with heavy use of i18n this was showing up high in profiler sessions. There are two factors contributing to the bottleneck: 1. a suboptimal big integer representation using strings, which requires repeated allocation and conversion from a character to numeric digits and back. 2. repeated computation of the necessary base-256 exponents and their multiplication factors. The first bottleneck is addressed using a representation that uses an array of individual digits. This avoids repeated conversion and allocation overhead is also greatly reduced, as adding two big integers can now be done in-place with virtually no memory allocations. The second point is addressed by a memoized exponentiation pool to optimize the multiplication of a base-256 exponent. As an additional optimization are the two 32-bit words now converted to decimal per word, instead of going through an intermediate byte buffer and doing the decimal conversion per byte. The results of these optimizations depend a lot on the number of i18n messages for which a message should be computed. Benchmarks have shown that computing message IDs is now ~6x faster for 1,000 messages, ~14x faster for 10,000 messages, and ~24x faster for 100,000 messages. PR Close #39694
) The result of utf-8 encoding a string was represented in a string, where each individual character represented a single byte according to its character code. All usages of this data were interested in the byte itself, so this required conversion from a character back to its code. This commit simply stores the individual bytes in array to avoid the conversion. This yields a ~10% performance improvement for i18n message ID computation. PR Close #39694
This issue has been automatically locked due to inactivity. Read more about our automatic conversation locking policy. This action has been performed automatically by a bot. |
Message ID computation makes extensive use of big integer
multiplications in order to translate the message's fingerprint into
a numerical representation. In large compilations with heavy use of i18n
this was showing up high in profiler sessions.
There are two factors contributing to the bottleneck:
repeated allocation and conversion from a character to numeric digits
and back.
multiplication factors.
The first bottleneck is addressed using a representation that uses an
array of individual digits. This avoids repeated conversion and
allocation overhead is also greatly reduced, as adding two big integers
can now be done in-place with virtually no memory allocations.
The second point is addressed by a memoized exponentiation pool to
optimize the multiplication of a base-256 exponent.
As an additional optimization are the two 32-bit words now converted to
decimal per word, instead of going through an intermediate byte buffer
and doing the decimal conversion per byte.
The results of these optimizations depend a lot on the number of i18n
messages for which a message should be computed. Benchmarks have shown
that computing message IDs is now ~6x faster for 1,000 messages, ~14x
faster for 10,000 messages, and ~24x faster for 100,000 messages.