Skip to content
This repository was archived by the owner on Mar 14, 2025. It is now read-only.

Files

Latest commit

8b98524 · Jul 13, 2020

History

History

inference

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Jun 29, 2020
Jun 29, 2020
Jul 13, 2020

README.md

TensorRT Inference Example (Python API)

A generic inference script example to handle TensorRT engines created from both fixed-shape and dynamic-shape ONNX models.

This script will detect the input/output names/shapes and generate random inputs to do inference on. It will also dump a lot of relevant metadata and information that happens along the way to clarify the steps that happen when doing inference with TensorRT.

Setup

  1. (Optional) Start a TensorRT 7.1 Docker container using the NGC 20.06 TensorRT Release:
# Mount current directory to /mnt and use /mnt as workspace inside container
nvidia-docker run -it -v `pwd`:/mnt -w /mnt nvcr.io/nvidia/tensorrt:20.06-py3
  1. (Optional) Download and run a sample script to:
  • Generate a fixed-shape Alexnet ONNX model
  • Generate a dynamic-shape Alexnet ONNX model
# Download sample pytorch->onnx script
wget https://gist.githubusercontent.com/rmccorm4/b72abac18aed6be4c1725db18eba4930/raw/3919c883b97a231877b454dae695fe074a1acdff/alexnet_onnx.py
# Install dependencies
python3 -m pip install torch==1.5.1 torchvision==0.6.1 onnx==1.6
# Export sample Alexnet model to ONNX with a dynamic batch dimension
python3 alexnet_onnx.py --opset=11
  1. (Optional) Use trtexec to:
  • Create a TensorRT engine from the fixed-shape Alexnet ONNX model in previous step
# Create TensorRT engine from fixed-shape ONNX model
trtexec --explicitBatch \
        --onnx=alexnet_fixed.onnx \
        --saveEngine=alexnet_fixed.engine
  1. (Optional) Use trtexec to:
  • Create a TensorRT engine from the dynamic-shape Alexnet ONNX model in previous step
# Emulate "maxBatchSize" behavior from implicit batch engines by setting
# an optimization profile with min=(1, *shape), opt=max=(maxBatchSize, *shape)
MAX_BATCH_SIZE=32
INPUT_NAME="actual_input_1"

# Convert dynamic batch ONNX model to TRT Engine with optimization profile defined
#   --minShapes: kMIN shape
#   --optShapes: kOPT shape
#   --maxShapes: kMAX shape
#   --shapes:    # Inference shape - this is like context.set_binding_shape(0, shape)
trtexec --onnx=alexnet_dynamic.onnx \
        --explicitBatch \
        --minShapes=${INPUT_NAME}:1x3x224x224 \
        --optShapes=${INPUT_NAME}:${MAX_BATCH_SIZE}x3x224x224 \
        --maxShapes=${INPUT_NAME}:${MAX_BATCH_SIZE}x3x224x224 \
        --shapes=${INPUT_NAME}:1x3x224x224 \
        --saveEngine=alexnet_dynamic.engine

Inference

  1. Run a generic python script that can handle both fixed-shape and dynamic-shape TensorRT engines.
  • This script generates random input data
  • For dynamic shape engines, this script defaults to input shapes of the engine's first optimization profile's (context.active_optimization_profile=0) kOPT shape

NOTE: This script is not meant for peak performance, but is meant to serve as a detailed example to demonstrate all of the moving parts involved in inference, especially for dynamic shape engines.

Fixed-shape Engine Example

root@49b19ca81d38:/mnt# python3 infer.py -e alexnet_fixed.engine   
Loaded engine: alexnet_fixed.engine
Active Optimization Profile: 0
Engine/Binding Metadata
        Number of optimization profiles: 1
        Number of bindings per profile: 2
        First binding for profile: 0
        Last binding for profile: 0
Generating Random Inputs
        Input [actual_input_1] shape: (10, 3, 224, 224)
Input Metadata
        Number of Inputs: 1
        Input Bindings for Profile 0: [0]
        Input names: ['actual_input_1']
        Input shapes: [(10, 3, 224, 224)]
Output Metadata
        Number of Outputs: 1
        Output names: ['output1']
        Output shapes: [(10, 1000)]
        Output Bindings for Profile 0: [1]
Inference Outputs: [array([[-1.9423429 , -0.31228915, -0.11592922, ..., -1.8848201 ,
        -1.967107  ,  1.77118   ],
       [-2.067131  , -0.1413779 , -0.10807458, ..., -1.8306588 ,
        -2.0896611 ,  1.8038476 ],
       [-1.9210347 , -0.59420735, -0.46031308, ..., -1.8159304 ,
        -1.953881  ,  1.7182006 ],
       ...,
       [-2.091892  , -0.24702555,  0.01917603, ..., -2.0384629 ,
        -2.075258  ,  1.877627  ],
       [-2.1854897 , -0.48040774, -0.19558612, ..., -1.9172117 ,
        -2.1862085 ,  1.877256  ],
       [-2.2054253 , -0.29715982, -0.15480372, ..., -1.5839869 ,
        -1.941939  ,  1.9311339 ]], dtype=float32)]

Dynamic-shape Engine Example

root@49b19ca81d38:/mnt# python3 infer.py -e alexnet_dynamic.engine 
Loaded engine: alexnet_dynamic.engine
Active Optimization Profile: 0
Engine/Binding Metadata
        Number of optimization profiles: 1
        Number of bindings per profile: 2
        First binding for profile: 0
        Last binding for profile: 0
Generating Random Inputs
        Input [actual_input_1] shape: (-1, 3, 224, 224)
        Profile Shapes for [actual_input_1]: [kMIN (1, 3, 224, 224) | kOPT (32, 3, 224, 224) | kMAX (32, 3, 224, 224)]
        Input [actual_input_1] shape was dynamic, setting inference shape to (32, 3, 224, 224)
Input Metadata
        Number of Inputs: 1
        Input Bindings for Profile 0: [0]
        Input names: ['actual_input_1']
        Input shapes: [(32, 3, 224, 224)]
Output Metadata
        Number of Outputs: 1
        Output names: ['output1']
        Output shapes: [(32, 1000)]
        Output Bindings for Profile 0: [1]
Inference Outputs: [array([[-2.1023765 , -0.3775048 , -0.17939475, ..., -1.8248225 ,
        -2.042846  ,  2.0328057 ],
       [-1.737236  , -0.54789805, -0.33429265, ..., -1.7502917 ,
        -2.173114  ,  1.699208  ],
       [-1.9979393 , -0.467514  , -0.3265229 , ..., -1.8016856 ,
        -2.025305  ,  1.6176162 ],
       ...,
       [-1.9945843 , -0.42376897,  0.11915839, ..., -1.7103177 ,
        -2.0614984 ,  1.7928884 ],
       [-2.180842  , -0.52559745, -0.12792507, ..., -1.9101417 ,
        -2.132482  ,  1.8081576 ],
       [-2.011236  , -0.29210696, -0.2629762 , ..., -1.8188174 ,
        -2.0216613 ,  1.884306  ]], dtype=float32)]