inference

TensorRT Inference Example (Python API)

A generic inference script example to handle TensorRT engines created from both fixed-shape and dynamic-shape ONNX models.

This script will detect the input/output names/shapes and generate random inputs to do inference on. It will also dump a lot of relevant metadata and information that happens along the way to clarify the steps that happen when doing inference with TensorRT.

Setup

(Optional) Start a TensorRT 7.1 Docker container using the NGC 20.06 TensorRT Release:

# Mount current directory to /mnt and use /mnt as workspace inside container
nvidia-docker run -it -v `pwd`:/mnt -w /mnt nvcr.io/nvidia/tensorrt:20.06-py3

(Optional) Download and run a sample script to:

Generate a fixed-shape Alexnet ONNX model
Generate a dynamic-shape Alexnet ONNX model

# Download sample pytorch->onnx script
wget https://gist.githubusercontent.com/rmccorm4/b72abac18aed6be4c1725db18eba4930/raw/3919c883b97a231877b454dae695fe074a1acdff/alexnet_onnx.py
# Install dependencies
python3 -m pip install torch==1.5.1 torchvision==0.6.1 onnx==1.6
# Export sample Alexnet model to ONNX with a dynamic batch dimension
python3 alexnet_onnx.py --opset=11

(Optional) Use trtexec to:

Create a TensorRT engine from the fixed-shape Alexnet ONNX model in previous step

# Create TensorRT engine from fixed-shape ONNX model
trtexec --explicitBatch \
        --onnx=alexnet_fixed.onnx \
        --saveEngine=alexnet_fixed.engine

(Optional) Use trtexec to:

Create a TensorRT engine from the dynamic-shape Alexnet ONNX model in previous step

# Emulate "maxBatchSize" behavior from implicit batch engines by setting
# an optimization profile with min=(1, *shape), opt=max=(maxBatchSize, *shape)
MAX_BATCH_SIZE=32
INPUT_NAME="actual_input_1"

# Convert dynamic batch ONNX model to TRT Engine with optimization profile defined
#   --minShapes: kMIN shape
#   --optShapes: kOPT shape
#   --maxShapes: kMAX shape
#   --shapes:    # Inference shape - this is like context.set_binding_shape(0, shape)
trtexec --onnx=alexnet_dynamic.onnx \
        --explicitBatch \
        --minShapes=${INPUT_NAME}:1x3x224x224 \
        --optShapes=${INPUT_NAME}:${MAX_BATCH_SIZE}x3x224x224 \
        --maxShapes=${INPUT_NAME}:${MAX_BATCH_SIZE}x3x224x224 \
        --shapes=${INPUT_NAME}:1x3x224x224 \
        --saveEngine=alexnet_dynamic.engine

Inference

Run a generic python script that can handle both fixed-shape and dynamic-shape TensorRT engines.

This script generates random input data
For dynamic shape engines, this script defaults to input shapes of the engine's first optimization profile's (context.active_optimization_profile=0) kOPT shape

NOTE: This script is not meant for peak performance, but is meant to serve as a detailed example to demonstrate all of the moving parts involved in inference, especially for dynamic shape engines.

Fixed-shape Engine Example

root@49b19ca81d38:/mnt# python3 infer.py -e alexnet_fixed.engine   
Loaded engine: alexnet_fixed.engine
Active Optimization Profile: 0
Engine/Binding Metadata
        Number of optimization profiles: 1
        Number of bindings per profile: 2
        First binding for profile: 0
        Last binding for profile: 0
Generating Random Inputs
        Input [actual_input_1] shape: (10, 3, 224, 224)
Input Metadata
        Number of Inputs: 1
        Input Bindings for Profile 0: [0]
        Input names: ['actual_input_1']
        Input shapes: [(10, 3, 224, 224)]
Output Metadata
        Number of Outputs: 1
        Output names: ['output1']
        Output shapes: [(10, 1000)]
        Output Bindings for Profile 0: [1]
Inference Outputs: [array([[-1.9423429 , -0.31228915, -0.11592922, ..., -1.8848201 ,
        -1.967107  ,  1.77118   ],
       [-2.067131  , -0.1413779 , -0.10807458, ..., -1.8306588 ,
        -2.0896611 ,  1.8038476 ],
       [-1.9210347 , -0.59420735, -0.46031308, ..., -1.8159304 ,
        -1.953881  ,  1.7182006 ],
       ...,
       [-2.091892  , -0.24702555,  0.01917603, ..., -2.0384629 ,
        -2.075258  ,  1.877627  ],
       [-2.1854897 , -0.48040774, -0.19558612, ..., -1.9172117 ,
        -2.1862085 ,  1.877256  ],
       [-2.2054253 , -0.29715982, -0.15480372, ..., -1.5839869 ,
        -1.941939  ,  1.9311339 ]], dtype=float32)]

Dynamic-shape Engine Example

root@49b19ca81d38:/mnt# python3 infer.py -e alexnet_dynamic.engine 
Loaded engine: alexnet_dynamic.engine
Active Optimization Profile: 0
Engine/Binding Metadata
        Number of optimization profiles: 1
        Number of bindings per profile: 2
        First binding for profile: 0
        Last binding for profile: 0
Generating Random Inputs
        Input [actual_input_1] shape: (-1, 3, 224, 224)
        Profile Shapes for [actual_input_1]: [kMIN (1, 3, 224, 224) | kOPT (32, 3, 224, 224) | kMAX (32, 3, 224, 224)]
        Input [actual_input_1] shape was dynamic, setting inference shape to (32, 3, 224, 224)
Input Metadata
        Number of Inputs: 1
        Input Bindings for Profile 0: [0]
        Input names: ['actual_input_1']
        Input shapes: [(32, 3, 224, 224)]
Output Metadata
        Number of Outputs: 1
        Output names: ['output1']
        Output shapes: [(32, 1000)]
        Output Bindings for Profile 0: [1]
Inference Outputs: [array([[-2.1023765 , -0.3775048 , -0.17939475, ..., -1.8248225 ,
        -2.042846  ,  2.0328057 ],
       [-1.737236  , -0.54789805, -0.33429265, ..., -1.7502917 ,
        -2.173114  ,  1.699208  ],
       [-1.9979393 , -0.467514  , -0.3265229 , ..., -1.8016856 ,
        -2.025305  ,  1.6176162 ],
       ...,
       [-1.9945843 , -0.42376897,  0.11915839, ..., -1.7103177 ,
        -2.0614984 ,  1.7928884 ],
       [-2.180842  , -0.52559745, -0.12792507, ..., -1.9101417 ,
        -2.132482  ,  1.8081576 ],
       [-2.011236  , -0.29210696, -0.2629762 , ..., -1.8188174 ,
        -2.0216613 ,  1.884306  ]], dtype=float32)]

Name	Name	Last commit message	Last commit date
parent directory ..
utils	utils	20.06 / TensorRT 7.1 Update (#20 )	Jun 29, 2020
README.md	README.md	20.06 / TensorRT 7.1 Update (#20 )	Jun 29, 2020
infer.py	infer.py	add default seed value to argparse	Jul 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Files

inference

inference

README.md

TensorRT Inference Example (Python API)

Setup

Inference

Fixed-shape Engine Example

Dynamic-shape Engine Example

Collapse file tree

Files

inference

Directory actions

More options

Directory actions

More options

Latest commit

History

inference

Folders and files

parent directory

README.md

TensorRT Inference Example (Python API)

Setup

Inference

Fixed-shape Engine Example

Dynamic-shape Engine Example