perf(compiler): optimize computation of i18n message ids #39694

JoostK · 2020-11-14T21:59:32Z

Message ID computation makes extensive use of big integer
multiplications in order to translate the message's fingerprint into
a numerical representation. In large compilations with heavy use of i18n
this was showing up high in profiler sessions.

There are two factors contributing to the bottleneck:

a suboptimal big integer representation using strings, which requires
repeated allocation and conversion from a character to numeric digits
and back.
repeated computation of the necessary base-256 exponents and their
multiplication factors.

The first bottleneck is addressed using a representation that uses an
array of individual digits. This avoids repeated conversion and
allocation overhead is also greatly reduced, as adding two big integers
can now be done in-place with virtually no memory allocations.

The second point is addressed by a memoized exponentiation pool to
optimize the multiplication of a base-256 exponent.

As an additional optimization are the two 32-bit words now converted to
decimal per word, instead of going through an intermediate byte buffer
and doing the decimal conversion per byte.

The results of these optimizations depend a lot on the number of i18n
messages for which a message should be computed. Benchmarks have shown
that computing message IDs is now ~6x faster for 1,000 messages, ~14x
faster for 10,000 messages, and ~24x faster for 100,000 messages.

Message ID computation makes extensive use of big integer multiplications in order to translate the message's fingerprint into a numerical representation. In large compilations with heavy use of i18n this was showing up high in profiler sessions. There are two factors contributing to the bottleneck: 1. a suboptimal big integer representation using strings, which requires repeated allocation and conversion from a character to numeric digits and back. 2. repeated computation of the necessary base-256 exponents and their multiplication factors. The first bottleneck is addressed using a representation that uses an array of individual digits. This avoids repeated conversion and allocation overhead is also greatly reduced, as adding two big integers can now be done in-place with virtually no memory allocations. The second point is addressed by a memoized exponentiation pool to optimize the multiplication of a base-256 exponent. As an additional optimization are the two 32-bit words now converted to decimal per word, instead of going through an intermediate byte buffer and doing the decimal conversion per byte. The results of these optimizations depend a lot on the number of i18n messages for which a message should be computed. Benchmarks have shown that computing message IDs is now ~6x faster for 1,000 messages, ~14x faster for 10,000 messages, and ~24x faster for 100,000 messages.

JoostK · 2020-11-15T15:15:37Z

For fun, here's the before and after for computing 100,000 message IDs:

petebacondarwin

Nice performance gains.

I find the names of the freestanding functions quite confusing, like numberTimesBigInt() and addNumberTimesBigInt()`. (Although I accept that these are mostly inherited from the previous implementation.) Perhaps we could make these (static/instance) methods on new classes?

Do you think you could add a few unit tests for some of this code while you are in here?

petebacondarwin · 2020-11-15T17:43:24Z

packages/compiler/src/i18n/digest.ts

+  /**
+   * Computes and memoizes the big integer value for `this.number * 2^exponent`.
+   */
+  getMultipliedByPowerOfTwo(exponent: number): BigInteger {


BIKESHED:

Suggested change

getMultipliedByPowerOfTwo(exponent: number): BigInteger {

multiplyByPowerOfTwo(exponent: number): BigInteger {

packages/compiler/src/i18n/big_integer.ts

The result of utf-8 encoding a string was represented in a string, where each individual character represented a single byte according to its character code. All usages of this data were interested in the byte itself, so this required conversion from a character back to its code. This commit simply stores the individual bytes in array to avoid the conversion. This yields a ~10% performance improvement for i18n message ID computation.

AndrewKushnir · 2020-11-16T20:01:24Z

Presubmit.

AndrewKushnir · 2020-11-16T22:57:18Z

Note to Caretaker: I'd consider this change to have a medium risk (i.e. not a low-risk one), please sync it as a standalone change to g3. Thank you.

AndrewKushnir · 2020-11-16T23:07:04Z

FYI, adding the "blocked" label for now to run additional tests in g3. While regular presubmit was successful, testing i18n extraction is not common, so there might not be sufficient validation that the change is backwards-compatible. I will run extra checks manually to verify that extracted messages retain the same ids with this change. Thank you.

AndrewKushnir · 2020-11-17T00:00:19Z

Verified message id generation by comparing output for one of the internal apps. Everything looks good 👍

Note: it'd still make sense to sync this change into g3 as a separate/standalone CL.

) The result of utf-8 encoding a string was represented in a string, where each individual character represented a single byte according to its character code. All usages of this data were interested in the byte itself, so this required conversion from a character back to its code. This commit simply stores the individual bytes in array to avoid the conversion. This yields a ~10% performance improvement for i18n message ID computation. PR Close #39694

Message ID computation makes extensive use of big integer multiplications in order to translate the message's fingerprint into a numerical representation. In large compilations with heavy use of i18n this was showing up high in profiler sessions. There are two factors contributing to the bottleneck: 1. a suboptimal big integer representation using strings, which requires repeated allocation and conversion from a character to numeric digits and back. 2. repeated computation of the necessary base-256 exponents and their multiplication factors. The first bottleneck is addressed using a representation that uses an array of individual digits. This avoids repeated conversion and allocation overhead is also greatly reduced, as adding two big integers can now be done in-place with virtually no memory allocations. The second point is addressed by a memoized exponentiation pool to optimize the multiplication of a base-256 exponent. As an additional optimization are the two 32-bit words now converted to decimal per word, instead of going through an intermediate byte buffer and doing the decimal conversion per byte. The results of these optimizations depend a lot on the number of i18n messages for which a message should be computed. Benchmarks have shown that computing message IDs is now ~6x faster for 1,000 messages, ~14x faster for 10,000 messages, and ~24x faster for 100,000 messages. PR Close #39694

) The result of utf-8 encoding a string was represented in a string, where each individual character represented a single byte according to its character code. All usages of this data were interested in the byte itself, so this required conversion from a character back to its code. This commit simply stores the individual bytes in array to avoid the conversion. This yields a ~10% performance improvement for i18n message ID computation. PR Close #39694

angular-automatic-lock-bot · 2020-12-19T16:38:13Z

This issue has been automatically locked due to inactivity.
Please file a new issue if you are encountering a similar or related problem.

Read more about our automatic conversation locking policy.

_{This action has been performed automatically by a bot.}

JoostK added refactoring Issue that involves refactoring or code-cleanup target: patch This PR is targeted for the next patch release area: compiler Issues related to `ngc`, Angular's template compiler labels Nov 14, 2020

ngbot bot modified the milestone: needsTriage Nov 14, 2020

google-cla bot added the cla: yes label Nov 14, 2020

JoostK force-pushed the i18n-message-id-perf branch from 713eb2e to 3b25a62 Compare November 14, 2020 22:23

JoostK marked this pull request as ready for review November 14, 2020 22:48

JoostK added the action: review The PR is still awaiting reviews from at least one requested reviewer label Nov 14, 2020

pullapprove bot requested a review from mhevery November 14, 2020 22:48

gbumanzordev approved these changes Nov 15, 2020

View reviewed changes

JoostK force-pushed the i18n-message-id-perf branch from 3b25a62 to 97f73b5 Compare November 15, 2020 15:13

JoostK requested a review from petebacondarwin November 15, 2020 15:15

JoostK force-pushed the i18n-message-id-perf branch 2 times, most recently from cc5d964 to f09002e Compare November 15, 2020 16:30

fixup! perf(compiler): optimize computation of i18n message ids

bf6c18e

JoostK force-pushed the i18n-message-id-perf branch from f09002e to 3490e9f Compare November 15, 2020 16:42

petebacondarwin approved these changes Nov 15, 2020

View reviewed changes

petebacondarwin added the action: cleanup The PR is in need of cleanup, either due to needing a rebase or in response to comments from reviews label Nov 15, 2020

fixup! perf(compiler): optimize computation of i18n message ids

3658000

JoostK force-pushed the i18n-message-id-perf branch from 3490e9f to 149aaf4 Compare November 15, 2020 19:25

petebacondarwin reviewed Nov 15, 2020

View reviewed changes

packages/compiler/src/i18n/big_integer.ts Outdated Show resolved Hide resolved

petebacondarwin reviewed Nov 15, 2020

View reviewed changes

packages/compiler/src/i18n/big_integer.ts Outdated Show resolved Hide resolved

JoostK force-pushed the i18n-message-id-perf branch from 149aaf4 to 8003a69 Compare November 15, 2020 19:35

fixup! perf(compiler): optimize computation of i18n message ids

74b7d1e

JoostK force-pushed the i18n-message-id-perf branch from 8003a69 to cc42b73 Compare November 15, 2020 19:37

JoostK added 2 commits November 15, 2020 20:54

fixup! perf(compiler): optimize computation of i18n message ids

e0cc95b

JoostK force-pushed the i18n-message-id-perf branch from cc42b73 to a27ede2 Compare November 15, 2020 19:55

JoostK removed the request for review from mhevery November 15, 2020 20:19

AndrewKushnir added risk: medium merge: caretaker note Alert the caretaker performing the merge to check the PR for an out of normal action needed or note and removed action: presubmit The PR is in need of a google3 presubmit labels Nov 16, 2020

AndrewKushnir added the state: blocked label Nov 16, 2020

AndrewKushnir removed the state: blocked label Nov 17, 2020

atscott closed this in 604b4e4 Nov 17, 2020

angular-automatic-lock-bot bot locked and limited conversation to collaborators Dec 19, 2020

pullapprove bot added the area: i18n label Dec 19, 2020

ngbot bot modified the milestones: needsTriage, Backlog Dec 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(compiler): optimize computation of i18n message ids #39694

perf(compiler): optimize computation of i18n message ids #39694

JoostK commented Nov 14, 2020 •

edited

JoostK commented Nov 15, 2020 •

edited

petebacondarwin left a comment

petebacondarwin Nov 15, 2020

AndrewKushnir commented Nov 16, 2020

AndrewKushnir commented Nov 16, 2020

AndrewKushnir commented Nov 16, 2020 •

edited

AndrewKushnir commented Nov 17, 2020 •

edited

angular-automatic-lock-bot bot commented Dec 19, 2020

	getMultipliedByPowerOfTwo(exponent: number): BigInteger {
	multiplyByPowerOfTwo(exponent: number): BigInteger {

perf(compiler): optimize computation of i18n message ids #39694

perf(compiler): optimize computation of i18n message ids #39694

Conversation

JoostK commented Nov 14, 2020 • edited

JoostK commented Nov 15, 2020 • edited

petebacondarwin left a comment

Choose a reason for hiding this comment

petebacondarwin Nov 15, 2020

Choose a reason for hiding this comment

AndrewKushnir commented Nov 16, 2020

AndrewKushnir commented Nov 16, 2020

AndrewKushnir commented Nov 16, 2020 • edited

AndrewKushnir commented Nov 17, 2020 • edited

angular-automatic-lock-bot bot commented Dec 19, 2020

JoostK commented Nov 14, 2020 •

edited

JoostK commented Nov 15, 2020 •

edited

AndrewKushnir commented Nov 16, 2020 •

edited

AndrewKushnir commented Nov 17, 2020 •

edited