This repository was archived by the owner on Sep 2, 2023. It is now read-only.
This repository was archived by the owner on Sep 2, 2023. It is now read-only.
Rediscussion for out-of-order CJS exec? #509
Closed
Description
I'd like to reopen the discussion of out-of-order CJS exec one last time, given how much criticism there has been over the lack of named exports for CommonJS in ESM.
I can revive a former PR for this which is still sitting around somewhere no problem.
The two "problematic semantics" as far as this group is concerned are:
- CJS executing during the instantiation phase instead of during the execute phase.
- Because of the getter triggers I would also prefer to move the "wrapper creation" to be lazily done on the first import, instead of for every CJS module. That is, doing the exports snapshot at import time, even if it was required earlier. The reason being that otherwise an app that loads 100s of CommonJS modules snapshotting all of those on startup will both be slower and also may trigger lots of unnecessary warning and error side effects.
I have no problem with either of these semantics and never have.
Will other members of this group finally reconsider their positions in the name of practicality for users?
This scenario is definitely one of the biggest catches for users that we can still change. It could be worth re-hashing this old number even if that's a small possibility.
Activity
giltayar commentedon May 4, 2020
I seem to remember a PR where you implemented a "golden middle": during the binding phase, Node assumes that the exports are as the imports defined them, leaves the execution to the execution phase, and if during the execution it finds that the imports were not the same as the exports, throws an exception.
That would sound to me like an (almost) perfect solution.
guybedford commentedon May 4, 2020
@giltayar yes this was the dynamic modules proposal that we put so much time into at TC39 (https://github.com/nodejs/dynamic-modules). The solution, implemented pretty much exactly as you describe, hits on some quite deep complexities:
export * from 'a'; export * from 'b'
is supposed to throw when a and b both export the same named export. Allowing the names to be user/consumer-defined means this validation phase needs to be post-handled / ignored somehow.import * as ns
in a cycle can expose a module namespace object for a CJS module with the exports being displayed before the execution of that CJS module has completed. Since the exposed exports are then the user definitions this exposes non-determinism despite any later execution error.The dynamic modules solution was to ban export star of commonjs modules for (1) and to patch the namespace for (2), but this approach was shot down at TC39 after multiple attempts at consensus.
The simple solution being discussed here can be implemented in a PR tomorrow, and doesn't come with anything close to these complexities.
GeoffreyBooth commentedon May 4, 2020
So this proposal determined the names of the CommonJS exports by looking at what was imported, e.g. in
import
statements; did we ever consider determining the names of the CommonJS exports by parsing the CommonJS?I remember there was opposition to evaluating the CommonJS in the instantiation phase, and strong opposition to evaluating CommonJS code twice; but what about just parsing it in one of these pre-evaluation phases? We would miss any CommonJS exports that are dynamically named, but perhaps that's okay? That could be a caveat we explain in the docs as a limitation of importing CommonJS into ESM; hopefully that edge case is rare enough that it shouldn't come up much.
ljharb commentedon May 4, 2020
Is there a way to reuse the results of the parse to minimize the perf hit of that?
giltayar commentedon May 4, 2020
I'm not a fan of parsing CJS modules, because it's heuristic, and there would be no rule we could write in the documentation that says when it will work and when not. Worse: due to a change in the parsing code (which will happen), a module that worked in one version of Node, will stop working in another.
Too non-deterministic for my taste.
giltayar commentedon May 4, 2020
True, and from TC39's perspective, they may be right. But if you're anyway deeming CJS to be "out of TC39 scope" (which I'm OK with!) then you can definitely go the "dynamic modules" route, which has the added benefit of making the execution order intuitive to the developer: I believe execution order should be "serial" and not an interleaving (which it will be if we execute CJS in the binding phase), whereby first the CJS in the tree will be executed and then the ESM.
jkrems commentedon May 4, 2020
The problem is that dynamic module is the route that does need TC39 support because it requires actual support in the VM (correct me if I'm wrong, Guy). Early execution or something like a top-level-parse would both be possible if we say "CJS exists outside of the standardized module system and its execution isn't considered module execution".
Afaik the current state is:
module.exports
andmodule.exports.default
because named exports doesn't map 1:1 to how CJS works.module.exports
andmodule.exports.default
because named exports doesn't map 1:1 to how CJS works.*
exports and namespace objects which may prevent it from ever finding acceptance (see Guy's comment above). Outside of*
, "perfect" bindings. CJS executes when the respective ESM node should execute. Collision betweenmodule.exports
andmodule.exports.default
because named exports doesn't map 1:1 to how CJS works.module.exports
asdefault
. No collision betweenmodule.exports.default
andmodule.exports
. CJS executes when the respective ESM node should execute. Relatively easy to simulate/recreate/modify in a userland loader.My personal take: I think partial parse would be a reasonable compromise and JDD's
esm
seemed to show that the performance impact was okay, especially since it would only affect the boundaries between ESM and CJS so the majority of modules wouldn't be double-parsed.The bad news is that interop-require conflated
module.exports.default
andmodule.exports
in some places. We would have to pickmodule.exports
for backwards compat afaict and that would create a potentially even uncannier valley than what we have today. If we're okay with a breaking change there, the parser could look out for that magic es module property and switch tomodule.exports.default
based on it.jkrems commentedon May 4, 2020
To elaborate on this point: All the more advanced solutions have issues once somebody wants to write a userland loader. If I want to serve CJS from HTTP/archive/compiled-on-the-fly, I may have to reproduce the core behavior. Assuming the core behavior is "parse the source and then apply a specific heuristic to determine likely named exports", that quickly becomes a challenge. The same is true for any kind of static analysis tool (compiler, linter, bundler, ...). It's not a blocker but I wanted to call it out.
bmeck commentedon May 4, 2020
We have discussed this many times, at length. I do not see a way forward and do not want to take this trade off given previous discussions.
I think this might be a bit downplayed. You import not just the CJS shallowly but all of its dependencies as well. This means that:
import
condition to load CJS in particular). Side effects in modules are a natural part of the runtime having so many features being 1st class dynamic JS APIs. If the goal is to make CJS a legacy feature, this instead would permanently force those workflows to be CJS in order to safeguard the timing of their evaluation. This would be against one of the goals of ESM allowing a path towards codebases that are purely ESM and/or work on environments like the web. Breaking such a goal seems to need to be weighed heavily.I agree that named import being able to pull from CJS is valuable, but I do not see the research into the affects that this causes to quickly jump on things. Additionally, this trade off doesn't seem to focus on anything except the ability to have named imports from faux modules as CJS. I would like to understand why none of the above are important vs that fact.
It would also be helpful to understand why the consumer needs this and the author is in a situation that updates would be too painful to make. Currently transpiled CJS can be loaded into our ESM implementation, and you can use the full interface it provides, albeit with manually reproducing what a transpiler would do to interface with it. Things like https://github.com/addaleax/gen-esm-wrapper are providing facades for build time. Understanding how the transpiled workflows are unable to maintain build time tooling for purposes like this would also be helpful.
Right now I do see that people using their transpilers are having problems when they try to stop using them; but, I don't think they need to stop using them. I do see that other problems with dropping transpilers exist (async loading, mocking, magic variables, etc.) and think the best approach at least in the short term is to continue transpiling, and think that trade offs should match transpiler behavior more than this would. It would be a breaking change from transpiler behavior as well.
This trade off does not necessarily have a way out in the future since it would be another breaking change to undo this oddity if we introduce it. If we land such a change I would also never want to revert it. We would forever not match historical transpiler semantics nor match ESM semantics of order of evaluation. Currently, the semantics are somewhat easy to understand, even if they do not provide all of the features of faux modules. I value the ability to understand/teach how my program works, and introducing implicit and transitive preemption is far from desirable to me. Additional mitigations for performance like late binding also make this even worse for console based debugging and match transpilers potentially even less.
Providing named imports from CJS does not solve a variety of other issues with ESM adoption, and I am unclear on how a variety of other upcoming and existing problems wouldn't be made even worse by taking this trade off. Things like standardized JSON modules are in talks in both TC39 and WHATWG and are being looked at as only providing a default export. Providing named imports from CJS and not from JSON would just be more confusion rather than having at least a semi-uniform approach for integrating values that are not designed for ESM.
I do not want to re-discuss all of these issues unless we can discuss the trade offs on both directions.
bmeck commentedon May 4, 2020
I forgot that this also may violate things like tick ordering of queued tasks, but I assume that is understood by the nature of this hoisting.
GeoffreyBooth commentedon May 4, 2020
@bmeck Does your take above apply to any of the options from @jkrems’ comment, or the overall idea of executing CommonJS out of order? I'm not clear on which proposal you're pointing out problems with.
Looking at Jan's options, it sounds like the last one (Default Only) is what we have now; Early Execution is probably disqualified for all the reasons you listed in your comment about it cementing CommonJS primacy; Dynamic Modules is a no-go without TC39 buy-in; and that leaves (Partial) Parse. I don't think the parse option has all the timing and other issues you're discussing, does it? Aside from the parse possibly missing some names, and the issue with
default
, are there any other problems with that option?bmeck commentedon May 4, 2020
@GeoffreyBooth parsing does not have the issues of timing, but does have heuristic concerns. See the output of transpilers like: this with
export *
which do require runtime info. I also had a stripped down pragma PR"use exports ...list...";
a long while back but people didn't like the perf problems it had. The complexity of a full JS parse is non-trivial and we would need to document our exact heuristic such that transpilers could target it and it could continued to be supported.GeoffreyBooth commentedon May 4, 2020
So others who have thought about this more can propose better solutions, but what I had in mind for a parsing algorithm would be to try to keep it simple enough that it's straightforward to understand. Like:
exports
object (i.e. anexports
variable that has novar
/const
/let
/function parameter declaration) or a likewise non-declaredmodule
object with anexports
property.exports.foo
orexports['foo']
) ormodule.exports.foo
andmodule.exports['foo']
etc.).exports = { foo: 1 }
, etc.).And . . . that's it? If there are any other really common ways of declaring exports, we could extend our logic to include them, but the idea would be that the docs describe this algorithm in plain English and say that CommonJS exports defined such that they're discoverable via this algorithm will be importable into ESM; and others will not be.
So yes, the transpiler output of
export * from
would fall into the unsupported category; but the transpiler output ofexport const foo =
would be parseable. Dittoexport function foo()
andexport class foo()
andexport { foo }
. The main exceptions that I can find in some cursory tests areexport * from
andexport default
would be unsupported, and we would call those out in the docs (as well as a traditional CommonJS export nameddefault
).42 remaining items
LeszekSwirski commentedon May 8, 2020
If ESM modules were to also be allowed to execute before full module graph linking, would that make on-import CJS execution more palatable?
weswigham commentedon May 8, 2020
I mean, arguably they already can, as they may have been cached by another module graph which executed prior to the currently resolving one.
LeszekSwirski commentedon May 8, 2020
Right, that's (one of the reasons) why I ask.
bmeck commentedon May 8, 2020
It should be noted that a few years ago while google was trying to ship ESM and testing the viability of it in real world productions workflows in the browser, they wanted to do out of order evaluation of purely ESM. They have a document at https://docs.google.com/document/d/1MJK0zigKbH4WFCKcHsWwFAzpU_DZppEAOpYJlIW7M7E/view ; however, as the document explains they chose not to do so for a variety of reasons. This also came up again this year with a similar result.
jkrems commentedon May 8, 2020
I think the problem is less that CJS may execute before ESM but that it always executes before any ESM file may execute in the graph. That means that certain kinds of code (bootstrapping, polyfills) can only ever be written as ESM if there's not a single CJS file involved in the entire program. Which is... a tall order.
MylesBorins commentedon Jun 17, 2020
Removing from module agenda for right now, feel free to re add
bmeck commentedon Aug 4, 2020
@guybedford are we ok closing this?