This repository was archived by the owner on Sep 2, 2023. It is now read-only.

Rediscussion for out-of-order CJS exec? #509

Closed

Rediscussion for out-of-order CJS exec?#509

I'd like to reopen the discussion of out-of-order CJS exec one last time, given how much criticism there has been over the lack of named exports for CommonJS in ESM.

I can revive a former PR for this which is still sitting around somewhere no problem.

The two "problematic semantics" as far as this group is concerned are:

CJS executing during the instantiation phase instead of during the execute phase.
Because of the getter triggers I would also prefer to move the "wrapper creation" to be lazily done on the first import, instead of for every CJS module. That is, doing the exports snapshot at import time, even if it was required earlier. The reason being that otherwise an app that loads 100s of CommonJS modules snapshotting all of those on startup will both be slower and also may trigger lots of unnecessary warning and error side effects.

I have no problem with either of these semantics and never have.

Will other members of this group finally reconsider their positions in the name of practicality for users?

This scenario is definitely one of the biggest catches for users that we can still change. It could be worth re-hashing this old number even if that's a small possibility.

added

I seem to remember a PR where you implemented a "golden middle": during the binding phase, Node assumes that the exports are as the imports defined them, leaves the execution to the execution phase, and if during the execution it finds that the imports were not the same as the exports, throws an exception.

That would sound to me like an (almost) perfect solution.

guybedford

ContributorAuthor

@giltayar yes this was the dynamic modules proposal that we put so much time into at TC39 (https://github.com/nodejs/dynamic-modules). The solution, implemented pretty much exactly as you describe, hits on some quite deep complexities:

export * from 'a'; export * from 'b' is supposed to throw when a and b both export the same named export. Allowing the names to be user/consumer-defined means this validation phase needs to be post-handled / ignored somehow.
import * as ns in a cycle can expose a module namespace object for a CJS module with the exports being displayed before the execution of that CJS module has completed. Since the exposed exports are then the user definitions this exposes non-determinism despite any later execution error.

The dynamic modules solution was to ban export star of commonjs modules for (1) and to patch the namespace for (2), but this approach was shot down at TC39 after multiple attempts at consensus.

The simple solution being discussed here can be implemented in a PR tomorrow, and doesn't come with anything close to these complexities.

GeoffreyBooth

Member

yes this was the dynamic modules proposal

So this proposal determined the names of the CommonJS exports by looking at what was imported, e.g. in import statements; did we ever consider determining the names of the CommonJS exports by parsing the CommonJS?

I remember there was opposition to evaluating the CommonJS in the instantiation phase, and strong opposition to evaluating CommonJS code twice; but what about just parsing it in one of these pre-evaluation phases? We would miss any CommonJS exports that are dynamically named, but perhaps that's okay? That could be a caveat we explain in the docs as a limitation of importing CommonJS into ESM; hopefully that edge case is rare enough that it shouldn't come up much.

ljharb

SponsorMember

Is there a way to reuse the results of the parse to minimize the perf hit of that?

giltayar

I'm not a fan of parsing CJS modules, because it's heuristic, and there would be no rule we could write in the documentation that says when it will work and when not. Worse: due to a change in the parsing code (which will happen), a module that worked in one version of Node, will stop working in another.

Too non-deterministic for my taste.

giltayar

but this approach was shot down at TC39 after multiple attempts at consensus

True, and from TC39's perspective, they may be right. But if you're anyway deeming CJS to be "out of TC39 scope" (which I'm OK with!) then you can definitely go the "dynamic modules" route, which has the added benefit of making the execution order intuitive to the developer: I believe execution order should be "serial" and not an interleaving (which it will be if we execute CJS in the binding phase), whereby first the CJS in the tree will be executed and then the ESM.

mhdawson

mentioned this

on May 4, 2020

Node.js Modules Team Meeting 2020-05-06 #510

jkrems

Contributor

then you can definitely go the "dynamic modules" route

The problem is that dynamic module is the route that does need TC39 support because it requires actual support in the VM (correct me if I'm wrong, Guy). Early execution or something like a top-level-parse would both be possible if we say "CJS exists outside of the standardized module system and its execution isn't considered module execution".

Afaik the current state is:

Early Execution: Technical side works fine, allows full named exports as long as they exist after initial execution. Downside is that CJS in the tree always runs before any ESM executes. ESM can never do bootstrap logic as long as there's any CJS in the tree. Some race conditions (potentially, likely rare) if we don't snapshot the exports of every CJS after first execution. Collision between module.exports and module.exports.default because named exports doesn't map 1:1 to how CJS works.
(Partial) Parse: Technical side works fine, allows named exports but only heuristically if they can be determined from a parse (top level parse?). Every imported CJS file has to be parsed twice (once by module loader, once by V8). Collision between module.exports and module.exports.default because named exports doesn't map 1:1 to how CJS works.
Dynamic Module: Technical side requires buy-in from V8 which requires buy-in from TC39. The weirdness around * exports and namespace objects which may prevent it from ever finding acceptance (see Guy's comment above). Outside of *, "perfect" bindings. CJS executes when the respective ESM node should execute. Collision between module.exports and module.exports.default because named exports doesn't map 1:1 to how CJS works.
Default Only: Technical side works fine but only allows module.exports as default. No collision between module.exports.default and module.exports. CJS executes when the respective ESM node should execute. Relatively easy to simulate/recreate/modify in a userland loader.

My personal take: I think partial parse would be a reasonable compromise and JDD's esm seemed to show that the performance impact was okay, especially since it would only affect the boundaries between ESM and CJS so the majority of modules wouldn't be double-parsed.

The bad news is that interop-require conflated module.exports.default and module.exports in some places. We would have to pick module.exports for backwards compat afaict and that would create a potentially even uncannier valley than what we have today. If we're okay with a breaking change there, the parser could look out for that magic es module property and switch to module.exports.default based on it.

jkrems

Contributor

Relatively easy to simulate/recreate/modify in a userland loader.

To elaborate on this point: All the more advanced solutions have issues once somebody wants to write a userland loader. If I want to serve CJS from HTTP/archive/compiled-on-the-fly, I may have to reproduce the core behavior. Assuming the core behavior is "parse the source and then apply a specific heuristic to determine likely named exports", that quickly becomes a challenge. The same is true for any kind of static analysis tool (compiler, linter, bundler, ...). It's not a blocker but I wanted to call it out.

bmeck

Member

We have discussed this many times, at length. I do not see a way forward and do not want to take this trade off given previous discussions.

Downside is that CJS in the tree always runs before any ESM executes. ESM can never do bootstrap logic as long as there's any CJS in the tree.

I think this might be a bit downplayed. You import not just the CJS shallowly but all of its dependencies as well. This means that:

transitive dependencies of CJS also get hoisted. it is hard to understand how things would be re-arranged by both humans and tools. I do not see this as making the user experience better.
workflows like console based debugging will start to get very confusing, I do not see this as making the user experience better.
it doesn't match how faux modules (transpiled CJS trying to emulate parts of ESM, not real ESM) work. If the goal is to achieve the same thing as faux modules, this does route not seem to be well understood or having history being tested in the wild. If tooling could provide such a historical benchmark to see breakage and/or UX it would be helpful.
it doesn't match what is now shipping in Node unflagged. Re-ordering things that people are depending in a stable version on is not something I feel comfortable with.
it cements that APMs/Polyfills/Security hardening/global error handlers/etc. need to stay CJS forever (and want import condition to load CJS in particular). Side effects in modules are a natural part of the runtime having so many features being 1st class dynamic JS APIs. If the goal is to make CJS a legacy feature, this instead would permanently force those workflows to be CJS in order to safeguard the timing of their evaluation. This would be against one of the goals of ESM allowing a path towards codebases that are purely ESM and/or work on environments like the web. Breaking such a goal seems to need to be weighed heavily.

I agree that named import being able to pull from CJS is valuable, but I do not see the research into the affects that this causes to quickly jump on things. Additionally, this trade off doesn't seem to focus on anything except the ability to have named imports from faux modules as CJS. I would like to understand why none of the above are important vs that fact.

It would also be helpful to understand why the consumer needs this and the author is in a situation that updates would be too painful to make. Currently transpiled CJS can be loaded into our ESM implementation, and you can use the full interface it provides, albeit with manually reproducing what a transpiler would do to interface with it. Things like https://github.com/addaleax/gen-esm-wrapper are providing facades for build time. Understanding how the transpiled workflows are unable to maintain build time tooling for purposes like this would also be helpful.

Right now I do see that people using their transpilers are having problems when they try to stop using them; but, I don't think they need to stop using them. I do see that other problems with dropping transpilers exist (async loading, mocking, magic variables, etc.) and think the best approach at least in the short term is to continue transpiling, and think that trade offs should match transpiler behavior more than this would. It would be a breaking change from transpiler behavior as well.

This trade off does not necessarily have a way out in the future since it would be another breaking change to undo this oddity if we introduce it. If we land such a change I would also never want to revert it. We would forever not match historical transpiler semantics nor match ESM semantics of order of evaluation. Currently, the semantics are somewhat easy to understand, even if they do not provide all of the features of faux modules. I value the ability to understand/teach how my program works, and introducing implicit and transitive preemption is far from desirable to me. Additional mitigations for performance like late binding also make this even worse for console based debugging and match transpilers potentially even less.

Providing named imports from CJS does not solve a variety of other issues with ESM adoption, and I am unclear on how a variety of other upcoming and existing problems wouldn't be made even worse by taking this trade off. Things like standardized JSON modules are in talks in both TC39 and WHATWG and are being looked at as only providing a default export. Providing named imports from CJS and not from JSON would just be more confusion rather than having at least a semi-uniform approach for integrating values that are not designed for ESM.

I do not want to re-discuss all of these issues unless we can discuss the trade offs on both directions.

bmeck

Member

I forgot that this also may violate things like tick ordering of queued tasks, but I assume that is understood by the nature of this hoisting.

GeoffreyBooth

Member

@bmeck Does your take above apply to any of the options from @jkrems’ comment, or the overall idea of executing CommonJS out of order? I'm not clear on which proposal you're pointing out problems with.

Looking at Jan's options, it sounds like the last one (Default Only) is what we have now; Early Execution is probably disqualified for all the reasons you listed in your comment about it cementing CommonJS primacy; Dynamic Modules is a no-go without TC39 buy-in; and that leaves (Partial) Parse. I don't think the parse option has all the timing and other issues you're discussing, does it? Aside from the parse possibly missing some names, and the issue with default, are there any other problems with that option?

bmeck

Member

@GeoffreyBooth parsing does not have the issues of timing, but does have heuristic concerns. See the output of transpilers like: this with export * which do require runtime info. I also had a stripped down pragma PR "use exports ...list..."; a long while back but people didn't like the perf problems it had. The complexity of a full JS parse is non-trivial and we would need to document our exact heuristic such that transpilers could target it and it could continued to be supported.

GeoffreyBooth

Member

So others who have thought about this more can propose better solutions, but what I had in mind for a parsing algorithm would be to try to keep it simple enough that it's straightforward to understand. Like:

Find all references to a non-declared exports object (i.e. an exports variable that has no var/const/let/function parameter declaration) or a likewise non-declared module object with an exports property.
For each such reference, parse the references of statically-defined properties of that object (such as exports.foo or exports['foo']) or module.exports.foo and module.exports['foo'] etc.).
Also parse assignment of an object to that reference (exports = { foo: 1 }, etc.).

And . . . that's it? If there are any other really common ways of declaring exports, we could extend our logic to include them, but the idea would be that the docs describe this algorithm in plain English and say that CommonJS exports defined such that they're discoverable via this algorithm will be importable into ESM; and others will not be.

So yes, the transpiler output of export * from would fall into the unsupported category; but the transpiler output of export const foo = would be parseable. Ditto export function foo() and export class foo() and export { foo }. The main exceptions that I can find in some cursory tests are export * from and export default would be unsupported, and we would call those out in the docs (as well as a traditional CommonJS export named default).

42 remaining items

LeszekSwirski

If ESM modules were to also be allowed to execute before full module graph linking, would that make on-import CJS execution more palatable?

weswigham

Contributor

If ESM modules were to also be allowed to execute before full module graph linking, would that make on-import CJS execution more palatable?

I mean, arguably they already can, as they may have been cached by another module graph which executed prior to the currently resolving one.

LeszekSwirski

Right, that's (one of the reasons) why I ask.

bmeck

Member

It should be noted that a few years ago while google was trying to ship ESM and testing the viability of it in real world productions workflows in the browser, they wanted to do out of order evaluation of purely ESM. They have a document at https://docs.google.com/document/d/1MJK0zigKbH4WFCKcHsWwFAzpU_DZppEAOpYJlIW7M7E/view ; however, as the document explains they chose not to do so for a variety of reasons. This also came up again this year with a similar result.

jkrems

Contributor

If ESM modules were to also be allowed to execute before full module graph linking, would that make on-import CJS execution more palatable?

I think the problem is less that CJS may execute before ESM but that it always executes before any ESM file may execute in the graph. That means that certain kinds of code (bootstrapping, polyfills) can only ever be written as ESM if there's not a single CJS file involved in the entire program. Which is... a tall order.