Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(compiler): optimize computation of i18n message ids #39694

Closed
wants to merge 6 commits into from

Conversation

JoostK
Copy link
Member

@JoostK JoostK commented Nov 14, 2020

Message ID computation makes extensive use of big integer
multiplications in order to translate the message's fingerprint into
a numerical representation. In large compilations with heavy use of i18n
this was showing up high in profiler sessions.

There are two factors contributing to the bottleneck:

  1. a suboptimal big integer representation using strings, which requires
    repeated allocation and conversion from a character to numeric digits
    and back.
  2. repeated computation of the necessary base-256 exponents and their
    multiplication factors.

The first bottleneck is addressed using a representation that uses an
array of individual digits. This avoids repeated conversion and
allocation overhead is also greatly reduced, as adding two big integers
can now be done in-place with virtually no memory allocations.

The second point is addressed by a memoized exponentiation pool to
optimize the multiplication of a base-256 exponent.

As an additional optimization are the two 32-bit words now converted to
decimal per word, instead of going through an intermediate byte buffer
and doing the decimal conversion per byte.

The results of these optimizations depend a lot on the number of i18n
messages for which a message should be computed. Benchmarks have shown
that computing message IDs is now ~6x faster for 1,000 messages, ~14x
faster for 10,000 messages, and ~24x faster for 100,000 messages.

@JoostK JoostK added refactoring Issue that involves refactoring or code-cleanup target: patch This PR is targeted for the next patch release area: compiler Issues related to `ngc`, Angular's template compiler labels Nov 14, 2020
@ngbot ngbot bot modified the milestone: needsTriage Nov 14, 2020
@google-cla google-cla bot added the cla: yes label Nov 14, 2020
@JoostK JoostK marked this pull request as ready for review November 14, 2020 22:48
@JoostK JoostK added the action: review The PR is still awaiting reviews from at least one requested reviewer label Nov 14, 2020
@pullapprove pullapprove bot requested a review from mhevery November 14, 2020 22:48
Message ID computation makes extensive use of big integer
multiplications in order to translate the message's fingerprint into
a numerical representation. In large compilations with heavy use of i18n
this was showing up high in profiler sessions.

There are two factors contributing to the bottleneck:

1. a suboptimal big integer representation using strings, which requires
repeated allocation and conversion from a character to numeric digits
and back.
2. repeated computation of the necessary base-256 exponents and their
multiplication factors.

The first bottleneck is addressed using a representation that uses an
array of individual digits. This avoids repeated conversion and
allocation overhead is also greatly reduced, as adding two big integers
can now be done in-place with virtually no memory allocations.

The second point is addressed by a memoized exponentiation pool to
optimize the multiplication of a base-256 exponent.

As an additional optimization are the two 32-bit words now converted to
decimal per word, instead of going through an intermediate byte buffer
and doing the decimal conversion per byte.

The results of these optimizations depend a lot on the number of i18n
messages for which a message should be computed. Benchmarks have shown
that computing message IDs is now ~6x faster for 1,000 messages, ~14x
faster for 10,000 messages, and ~24x faster for 100,000 messages.
@JoostK
Copy link
Member Author

JoostK commented Nov 15, 2020

For fun, here's the before and after for computing 100,000 message IDs:

Screenshot 2020-11-15 at 16 12 50

Screenshot 2020-11-15 at 17 46 53

@JoostK JoostK force-pushed the i18n-message-id-perf branch 2 times, most recently from cc5d964 to f09002e Compare November 15, 2020 16:30
Copy link
Member

@petebacondarwin petebacondarwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice performance gains.

I find the names of the freestanding functions quite confusing, like numberTimesBigInt() and addNumberTimesBigInt()`. (Although I accept that these are mostly inherited from the previous implementation.) Perhaps we could make these (static/instance) methods on new classes?

Do you think you could add a few unit tests for some of this code while you are in here?

/**
* Computes and memoizes the big integer value for `this.number * 2^exponent`.
*/
getMultipliedByPowerOfTwo(exponent: number): BigInteger {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BIKESHED:

Suggested change
getMultipliedByPowerOfTwo(exponent: number): BigInteger {
multiplyByPowerOfTwo(exponent: number): BigInteger {

@petebacondarwin petebacondarwin added the action: cleanup The PR is in need of cleanup, either due to needing a rebase or in response to comments from reviews label Nov 15, 2020
The result of utf-8 encoding a string was represented in a string, where
each individual character represented a single byte according to its
character code. All usages of this data were interested in the byte
itself, so this required conversion from a character back to its code.
This commit simply stores the individual bytes in array to avoid the
conversion. This yields a ~10% performance improvement for i18n message
ID computation.
@JoostK JoostK removed the request for review from mhevery November 15, 2020 20:19
@JoostK JoostK added action: merge The PR is ready for merge by the caretaker action: presubmit The PR is in need of a google3 presubmit and removed action: cleanup The PR is in need of cleanup, either due to needing a rebase or in response to comments from reviews action: review The PR is still awaiting reviews from at least one requested reviewer labels Nov 15, 2020
@AndrewKushnir
Copy link
Contributor

Presubmit.

@AndrewKushnir AndrewKushnir added risk: medium merge: caretaker note Alert the caretaker performing the merge to check the PR for an out of normal action needed or note and removed action: presubmit The PR is in need of a google3 presubmit labels Nov 16, 2020
@AndrewKushnir
Copy link
Contributor

Note to Caretaker: I'd consider this change to have a medium risk (i.e. not a low-risk one), please sync it as a standalone change to g3. Thank you.

@AndrewKushnir
Copy link
Contributor

AndrewKushnir commented Nov 16, 2020

FYI, adding the "blocked" label for now to run additional tests in g3. While regular presubmit was successful, testing i18n extraction is not common, so there might not be sufficient validation that the change is backwards-compatible. I will run extra checks manually to verify that extracted messages retain the same ids with this change. Thank you.

@AndrewKushnir
Copy link
Contributor

AndrewKushnir commented Nov 17, 2020

Verified message id generation by comparing output for one of the internal apps. Everything looks good 👍

Note: it'd still make sense to sync this change into g3 as a separate/standalone CL.

@atscott atscott closed this in 604b4e4 Nov 17, 2020
atscott pushed a commit that referenced this pull request Nov 17, 2020
)

The result of utf-8 encoding a string was represented in a string, where
each individual character represented a single byte according to its
character code. All usages of this data were interested in the byte
itself, so this required conversion from a character back to its code.
This commit simply stores the individual bytes in array to avoid the
conversion. This yields a ~10% performance improvement for i18n message
ID computation.

PR Close #39694
atscott pushed a commit that referenced this pull request Nov 17, 2020
Message ID computation makes extensive use of big integer
multiplications in order to translate the message's fingerprint into
a numerical representation. In large compilations with heavy use of i18n
this was showing up high in profiler sessions.

There are two factors contributing to the bottleneck:

1. a suboptimal big integer representation using strings, which requires
repeated allocation and conversion from a character to numeric digits
and back.
2. repeated computation of the necessary base-256 exponents and their
multiplication factors.

The first bottleneck is addressed using a representation that uses an
array of individual digits. This avoids repeated conversion and
allocation overhead is also greatly reduced, as adding two big integers
can now be done in-place with virtually no memory allocations.

The second point is addressed by a memoized exponentiation pool to
optimize the multiplication of a base-256 exponent.

As an additional optimization are the two 32-bit words now converted to
decimal per word, instead of going through an intermediate byte buffer
and doing the decimal conversion per byte.

The results of these optimizations depend a lot on the number of i18n
messages for which a message should be computed. Benchmarks have shown
that computing message IDs is now ~6x faster for 1,000 messages, ~14x
faster for 10,000 messages, and ~24x faster for 100,000 messages.

PR Close #39694
atscott pushed a commit that referenced this pull request Nov 17, 2020
)

The result of utf-8 encoding a string was represented in a string, where
each individual character represented a single byte according to its
character code. All usages of this data were interested in the byte
itself, so this required conversion from a character back to its code.
This commit simply stores the individual bytes in array to avoid the
conversion. This yields a ~10% performance improvement for i18n message
ID computation.

PR Close #39694
@angular-automatic-lock-bot
Copy link

This issue has been automatically locked due to inactivity.
Please file a new issue if you are encountering a similar or related problem.

Read more about our automatic conversation locking policy.

This action has been performed automatically by a bot.

@angular-automatic-lock-bot angular-automatic-lock-bot bot locked and limited conversation to collaborators Dec 19, 2020
@ngbot ngbot bot modified the milestones: needsTriage, Backlog Dec 19, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
action: merge The PR is ready for merge by the caretaker area: compiler Issues related to `ngc`, Angular's template compiler area: i18n cla: yes merge: caretaker note Alert the caretaker performing the merge to check the PR for an out of normal action needed or note refactoring Issue that involves refactoring or code-cleanup risk: medium target: patch This PR is targeted for the next patch release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants