Skip to content

Profiling native threads? #332

Open
@SimonSapin

Description

@SimonSapin

Does py-spy record ignore threads that don’t contain any Python stack frame by default?

I have a Python program with a native extension (that happens to be written in Rust). That extension starts a thread (with Rust’s std::thread::spawn) to do some CPU-intensive work in parallel with other work. The child thread never runs a Python interpreter. The SVG output of the profiler is missing everything in the second thread. --native does show Rust stack frames, but only in the parent thread. Adding --threads adds the ID of the parent thread to the output but nothing else. Adding --idle doesn’t seem to change anything for this program.

When using py-spy dump --pid (at the right time) however, the stack of both threads is printed correctly.

Can I use py-spy to profile both threads?

Activity

benfred

benfred commented on Dec 31, 2020

@benfred
Owner

Not right now =( We merge the native stack traces into python frames - but not vice versa. You'll have to profile with other native profiling tools like perf etc to get profile the native thread

SimonSapin

SimonSapin commented on Jan 6, 2021

@SimonSapin
Author

That’s unfortunate. Can you say more about this merging? Does it need to happen?

ogrisel

ogrisel commented on Aug 4, 2021

@ogrisel

Indeed that would be very helpful to have py-spy handle native threads in the reporting to understand the performance of CPU intensive Python programs that use datascience libraries like numpy that rely on multi-threaded linear algebra native libraries such as OpenBLAS, MKL and co.

Same for machine learning libraries like scikit-learn, lightgbm and xgboost that use OpenMP threads in the CPU intensive sections of the code written in Cython or C++.

At the moment profiling with py-spy --native --threads --format speedscope and loading the results into the speedscope visualizer makes no sense to me...

Jongy

Jongy commented on Aug 6, 2021

@Jongy
Contributor

We're using libunwind-ptrace in PyPerf and we just place native frames on top of the Python frames (stopping at the first native frame that is the PyEval_EvalFrame* which belong to the topmost Python function). For a truly native thread with no Python frames, we will just have its native stack.

IIRC py-spy uses libunwind-ptrace as well? So this rather simple scheme could work.

ogrisel

ogrisel commented on Aug 18, 2021

@ogrisel

Not right now =( We merge the native stack traces into python frames - but not vice versa. You'll have to profile with other native profiling tools like perf etc to get profile the native thread

@benfred It would be great to have native thread in py-spy: in my case, some of those native threads are managed by OpenMP via Cython prange loops: in this case they can call Cython functions and py-spy Cython support would be very handy.

Furthermore, if speedscope ever supports multitrack views with time-aligned traces, it would be very helpful to understand when those native threads come into play and interact with the calling Python code.

Would @Jongy's suggested solution above work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Participants

      @benfred@ogrisel@SimonSapin@Jongy

      Issue actions

        Profiling native threads? · Issue #332 · benfred/py-spy