-
Notifications
You must be signed in to change notification settings - Fork 1.6k
trtserver uses more than 20 CPUs #1080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
TRTIS uses ONNX Runtime to execute ONNX models https://github.com/microsoft/onnxruntime. I think this is more of an ONNX Runtime question as to why CPU vs GPU is used to execute the model. Are you setting any instance_group or optimization options in your model configuration? |
@deadeyegoodwin Our config.pbtxt is:
If we use TRTIS and remove instance_group and optimization in config.pbtxt, trtserver also uses more than 20 CPUs. What's more, we run ONNX model directly with ONNX runtime (enable tensorrt), and it only uses 1 cpu. ONNX model download :
We perf the trtserver, and get its flame graph: Flame graph file is: It shows that libgomp.so.1.0.0 takes a lot of CPU. And our GCC version is 7.1.0. Next, we will build TRTIS using debug model, and perf again. |
@deadeyegoodwin I have a Dockerfile that builds ONNX Runtime and an sample executable to load / run model in ONNX Runtime directly, but that is out-of-date (before ONNX Runtime v1.0.0)... I can revisit it and post it here later this week. |
@GuanLuo
It works well. |
@lxl910915 What is the ONNX Runtime version that you are using? The above code is similar to how we invoke the ORT APis, except that for now we always set disable in By the way, the order you call |
@GuanLuo Thanks for you reply.
What's more, if we set ORT_ENABLE_BASIC in SetSessionGraphOptimizationLevel and then enable dynamic batch, the problem is still there. If we add --fold_const for tf2onnx.convert, the trtserver only uses 1 CPU. |
I assume you achieve 3. by changing the source code? Otherwise I think TRTIS always prioritize other GPU accelerator over CUDA. If so, can you share the code change? It is strange that the order changes causes exception on TRTIS side. Are you building TRTIS on 20.01 branch or master? The master is now rolled forward to use ONNX Runtime 1.1.0. If you already built on master, then we should investigate further... |
@GuanLuo In onnx_backend.cc file, we changed
to
We build the master branch, and problem still exist. |
Same problem with onnx/tensorflow-onnx#784 (comment) |
So the root cause of the issue is from ONNXRuntime? |
It seems that. |
Uh oh!
There was an error while loading. Please reload this page.
Description
Model: Tensorflow EAST model
Convert saved_model to onnx:
python -m tf2onnx.convert --saved-model /tmp/SavedModel --output model.onnx --outputs feature_fusion/concat_3,feature_fusion/Conv_7/Sigmoid --opset 10
Result of convert saved_model to onnx:
After optimization: Add -3 (19->16), Const -59 (379->320), Gather +3 (0->3), Identity -18 (18->0), Reshape +2 (0->2), Transpose -262 (264->2)
trtserver loads this model.onnx. When a client using gRPC to connect to the trtserver. The trtserver will use more than 20 CPUs and use less GPU.

However, when we add
--fold_const
to convert saved_model to onnx bypython -m tf2onnx.convert --saved-model /tmp/SavedModel --output model.onnx --outputs feature_fusion/concat_3,feature_fusion/Conv_7/Sigmoid --opset 10 --fold_const
.Result of convert saved_model to onnx:
After optimization: Add -63 (79->16), Const -10 (145->135), Identity -18 (18->0), Reshape +2 (0->2), Transpose -138 (140->2)
Now, the trtserver only uses 1 CPU and uses more GPU.
TRTIS Information
What version of TRTIS are you using? 20.01
Are you using the TRTIS container or did you build it yourself? build it ourself
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
The text was updated successfully, but these errors were encountered: