Description
Does py-spy record
ignore threads that don’t contain any Python stack frame by default?
I have a Python program with a native extension (that happens to be written in Rust). That extension starts a thread (with Rust’s std::thread::spawn
) to do some CPU-intensive work in parallel with other work. The child thread never runs a Python interpreter. The SVG output of the profiler is missing everything in the second thread. --native
does show Rust stack frames, but only in the parent thread. Adding --threads
adds the ID of the parent thread to the output but nothing else. Adding --idle
doesn’t seem to change anything for this program.
When using py-spy dump --pid
(at the right time) however, the stack of both threads is printed correctly.
Can I use py-spy to profile both threads?
Activity
benfred commentedon Dec 31, 2020
Not right now =( We merge the native stack traces into python frames - but not vice versa. You'll have to profile with other native profiling tools like perf etc to get profile the native thread
SimonSapin commentedon Jan 6, 2021
That’s unfortunate. Can you say more about this merging? Does it need to happen?
ogrisel commentedon Aug 4, 2021
Indeed that would be very helpful to have py-spy handle native threads in the reporting to understand the performance of CPU intensive Python programs that use datascience libraries like numpy that rely on multi-threaded linear algebra native libraries such as OpenBLAS, MKL and co.
Same for machine learning libraries like scikit-learn, lightgbm and xgboost that use OpenMP threads in the CPU intensive sections of the code written in Cython or C++.
At the moment profiling with
py-spy --native --threads --format speedscope
and loading the results into the speedscope visualizer makes no sense to me...Jongy commentedon Aug 6, 2021
We're using libunwind-ptrace in PyPerf and we just place native frames on top of the Python frames (stopping at the first native frame that is the
PyEval_EvalFrame*
which belong to the topmost Python function). For a truly native thread with no Python frames, we will just have its native stack.IIRC py-spy uses libunwind-ptrace as well? So this rather simple scheme could work.
ogrisel commentedon Aug 18, 2021
@benfred It would be great to have native thread in py-spy: in my case, some of those native threads are managed by OpenMP via Cython
prange
loops: in this case they can call Cython functions and py-spy Cython support would be very handy.Furthermore, if speedscope ever supports multitrack views with time-aligned traces, it would be very helpful to understand when those native threads come into play and interact with the calling Python code.
Would @Jongy's suggested solution above work?
--native-all
flag that also print the stack of non-python threads #637