Description
Hello folks!
At Diagnostics WG we are assessing all the supported diagnostic tooling following https://github.com/nodejs/node/blob/main/doc/contributing/diagnostic-tooling-support-tiers.md. The assessment is almost done, the remaining piece is the nodejs/diagnostics#535 (which will be partially finished on #44549 merge).
However, I have talked directly to @No9 and we share the same strong feeling that Node.js should really keep the support of eBPF available while it can certainly be a game changer when dealing with large applications. For context, eBPF
was discussed previously on nodejs/diagnostics#386 and it got no traction, then, closed. Recently, we’ve dropped the support to DTrace on #43652 making the eBPF
support out-of-date.
I would like to revert the #43652 and make the eBPF support officially supported by including a few other tracepoints, and obviously, help with any build problems that we might have with this support(maybe, create a @nodejs/ebpf team?).
To move towards official support, what would this feature need?
cc: @nodejs/diagnostics
Activity
tony-go commentedon Sep 7, 2022
Thanks for this initiative @RafaelGSS 👏
Regarding this issue (description), maybe could you:
Regarding the feature, the only criteria I'd like to advocate for is a guide/doc within the Node.js website, but I think you already have that in mind.
bnoordhuis commentedon Sep 7, 2022
When you say "eBPF" you mean the static USDT probes, right?
I removed them because they were an ongoing maintenance drag and because, to a first approximation, no one uses them. I literally couldn't find anyone who did.
Functionality-wise, the V8 inspector protocol can do everything the probes did, and much more besides.
RafaelGSS commentedon Sep 7, 2022
For the current functionality, yes. But, I'd include dynamic probes too.
V8 Inspector can do it locally, but production-wise the V8 inspector protocol adds a considerable overhead when enabled. When diagnosing production apps, USDT probes are definitely a good way to go.
Could you list a few examples? I'm happy to help with that maintenance.
Qard commentedon Sep 7, 2022
The inspector protocol is not at all suitable for production use. The overhead of just turning it on is enormous. Static probes, on the other hand, have basically zero impact until something listens to them.
I'd agree about the maintenance issue because the previous implementation lacked any coherence and we should probably have some sort of cross-platform compatibility layer for it to work across platforms without so much duplication.
bnoordhuis commentedon Sep 8, 2022
Those Just Work, right? That's what the node-dtrace-provider package is all about.
Sure. They broke like clockwork with every major (and sometimes minor) V8 upgrade.
Be careful what you're volunteering for! Go over to the node-v8 repo to see what a massive pain every upgrade is. Shout out to unsung hero @targos.
bnoordhuis commentedon Sep 8, 2022
By the way, one more indicator that no one used the static probes: the introduction of V8's Ignition bytecode interpreter in 2017 broke v8ustack.d, the thingy that prints JS stack frames. No one complained.
RafaelGSS commentedon Sep 8, 2022
Honestly, I don't think it's a good decision to just drop it because it's breaking for some weird reason. I can certainly agree with removing it if no one helps in that maintenance, that's totally fair.
However, it's not fair to say that no one is using it, the reason might be just because there is no content about it using Node.js. Developers might not be using it, because they are unaware of this possibility.
Would be great to have some examples of breaking builds due to DTrace support. Collecting all the content we have in order to measure what would need to bring that support back in a stable way (obviously, not adding extra work on each release) is reasonable IMHO.
bnoordhuis commentedon Sep 9, 2022
Microsoft advertised ETW support when it landed (which piggybacks on the dtrace infrastructure) but that doesn't seem to have lead to a measurable uptick. The Windows user base is big so that's indicative.
RafaelGSS commentedon Sep 10, 2022
IMHO a really small piece of the market uses Windows + Node.js on a production server, which is the real value of this support.
Qard commentedon Sep 10, 2022
Yeah, lots of people dev on Windows, very few run production workloads on Windows. Prod is where static probes matter, because turning on the inspector in prod will take down the server in most heavy workloads.
jasnell commentedon Sep 10, 2022
As @bnoordhuis points out the stuff that was removed was removed because it was simply unmaintained. I'm perfectly fine with that stuff coming back in so long as it is going to be actively maintained, otherwise it's just tech debt as soon as it lands.
RafaelGSS commentedon Sep 10, 2022
Exactly @jasnell, that's the goal of this issue. I'm trying to collect all the pain of maintaining DTrace and discuss better ways to support it without being a tech debt. If we bring this support back, I want to make sure that at the very least we'll have guides and content about it.
Qard commentedon Sep 10, 2022
Before bringing it back, I think we need a compatibility layer to have one consistent interface across platforms. Something that is for static probes what libuv is for async I/O. Without such a thing I don't see this being stable or maintained enough to warrant its return.
I think we also would need a more clear vision on exactly what we should have probes for, what data should be shared through them, and document all that extensively. The past stuff was so poorly documented that it was essentially useless to anyone that hadn't dug through the code themselves to figure out what was actually there.
RafaelGSS commentedon Oct 2, 2022
UPDATE:
We've talked about it at the Node.js Collab Summit and to make it happen we would need to create a proposal covering at least the following points:
13 remaining items